foxcpp / maddy

✉️ Composable all-in-one mail server.
https://maddy.email
GNU General Public License v3.0
5.05k stars 243 forks source link

Implement wildduck's storage architecture for efficiency and scalability #291

Open figassis opened 3 years ago

figassis commented 3 years ago

Use case

What problem you are trying to solve? Maildir is less space efficient and less scalable than a clustered database as a mail store.

Note alternatives you considered and why they are not useful. I've tried using Maildir over an S3 backend, but performance can be an issue.

Your idea for a solution

Compress messages, deduplicate attachments and store in a clustered database like MongoDB.

How your solution would work in general? Wildduck stores messages and attachments in MongoDB. It compresses data and deduplicates attachments, greatly reducing storage requirements and allowing us to easily scale our deployments. I currently use it in production and works great.

foxcpp commented 3 years ago

My current idea of distributed/scalable deployment is putting go-imap-sql on top of CockroachDB with message blobs stored in some block storage (e.g. S3). This all is tracked in https://github.com/foxcpp/maddy/issues/279.

Attachment deduplication may be worth exploring though.

figassis commented 3 years ago

I agree. Probably WD gets most gains from attachment deduplication rather than the specific storage backend. Deduplication can easily be done by storing attachment hashes, and may even bring a performance improvement as you would often not need to send a file to storage. Deleting messages with attachments would only delete the file and hash if it's the last message pointing to it.

I'm not very familiar with the codebase, but I do have go experience, so I can help as soon as I find some bandwidth.

theduke commented 3 years ago

I second the S3 backend.

That also enables S3 compatible storage and can easily be self-hosted with minio.

Avamander commented 3 years ago

I do not want the maintenance burden of a separate server/machine/etc., neither wildduck, maildir, S3 or cockroachDB.

I would appreciate the ability to store my mail in the same database as the metadata (e.g. PostgreSQL). Maybe not the same table as the metadata, but still. This would make consistent backups trivial and advanced search, filtering and analysis much easier. Same applies to attachments, would make things like for example deduplication trivial.

foxcpp commented 3 years ago

Early versions of imapsql backend stored message contents as a blob in the same table as metadata. That turned out to be a performance problem. Now message contents are stored into abstracted "external storage", with the only currently available implementation being fs directory. It is definitely possible to add an implementation that just stores blobs in table rows. This should not cause performance problems if the table is separate from metadata.