axllent / mailpit

An email and SMTP testing tool with API for developers
https://mailpit.axllent.org
MIT License
5.63k stars 139 forks source link

Remote storage #254

Closed ulexxander closed 6 months ago

ulexxander commented 8 months ago

Hello, we really like this project and use it everyday on our project, deploying it to dynamic staging environments on Kubernetes for each GitLab Merge (Pull) request. We have a lot of them.

While our application is stateless and uses database as a storage, Mailpit is not and uses local SQLITE database, which requires us to create PersistentVolume for each Merge Request to be able to persist Mailpit data on this environment. We would like to avoid creating disk for each Merge Request, it would simplify infrastructure and improve performance.

Do you have any plans on implementing any remote storage, that's not bound to local filesystem?

What do you think about https://rqlite.io ? I could try to contribute and implement it. It seems that it is compatible and implementing it as an alternative storage should be easy.

axllent commented 8 months ago

Hi @ulexxander. I'm wondering if there might be an easier way for you to solve this issue. Could you not just run a dedicated Mailpit server and send mail via SMTP directly to that instead? It would mean all mail from each client ends up in the same mailbox, however that can potentially be addressed by using tags or similar.

To implement a secondary/alternative storage will (in my opinion) not be easy, as Mailpit was never designed to cater for that, so I fear a huge rewrite would be necessary to make this possible - I don't think it will be easy or quick. That being said, if you were willing to do the work and are somehow able to integrate rsqlite as an optional alternative storage, then I have no objection.

I'm keen to discuss this further before you get started though, as I suspect there may already be easy solutions already possible without resorting to any Mailpit rewrites.

ulexxander commented 8 months ago

@axllent Thanks for explanation!

Actually proof-of-concept persisting data in rqlite was quite simple and I was able to persist and display mailbox from rqlite :smile:

diff --git a/internal/storage/database.go b/internal/storage/database.go
index 540d67c..7f79984 100644
--- a/internal/storage/database.go
+++ b/internal/storage/database.go
@@ -18,6 +18,10 @@ import (

        // sqlite (native) - https://gitlab.com/cznic/sqlite
        _ "modernc.org/sqlite"
+       // rqlite (remote) - https://github.com/rqlite/gorqlite
+       // https://rqlite.io/
+
+       _ "github.com/rqlite/gorqlite/stdlib"
 )

 var (
@@ -51,9 +55,9 @@ func InitDB() error {

        var err error

-       dsn := fmt.Sprintf("file:%s?cache=shared", p)
+       dsn := "http://localhost:4001?disableClusterDiscovery=true"

-       db, err = sql.Open("sqlite", dsn)
+       db, err = sql.Open("rqlite", dsn)
        if err != nil {
                return err
        }
@@ -62,12 +66,6 @@ func InitDB() error {
        // @see https://github.com/mattn/go-sqlite3#faq
        db.SetMaxOpenConns(1)

-       // SQLite performance tuning (https://phiresky.github.io/blog/2020/sqlite-performance-tuning/)
-       _, err = db.Exec("PRAGMA journal_mode = WAL; PRAGMA synchronous = normal;")
-       if err != nil {
-               return err
-       }
-
        // create tables if necessary & apply migrations
        if err := dbApplyMigrations(); err != nil {
                return err

But probably because rqlite database driver is built on JSON API, it is behaving a little bit different than SQLite one, so while testing I needed to add few more changes:

  1. mailbox_data.Email saved as base64 (otherwise serialization of []byte to JSON breaks that compressed BLOB)
  2. it's unable to Scan() SQL INTEGER into Go int64 - probably because it is a number in JSON response and driver is decoding it into interface{} or something, therefore json package Unmarshal's it into float64.
  3. INSERT RETURNING clause also didn't work because rqlite disallows to modify data on /query endpoint - Query() method is used instead of Exec() in code to fetch result - didn't resolve that.
diff --git a/internal/storage/messages.go b/internal/storage/messages.go
index 03da204..c490e9a 100644
--- a/internal/storage/messages.go
+++ b/internal/storage/messages.go
@@ -4,6 +4,7 @@ import (
        "bytes"
        "context"
        "database/sql"
+       "encoding/base64"
        "encoding/json"
        "errors"
        "fmt"
@@ -102,7 +103,8 @@ func Store(body *[]byte) (string, error) {

        // insert compressed raw message
        compressed := dbEncoder.EncodeAll(*body, make([]byte, 0, size))
-       _, err = tx.Exec("INSERT INTO mailbox_data(ID, Email) values(?,?)", id, string(compressed))
+       compressedBase64 := base64.StdEncoding.EncodeToString(compressed)
+       _, err = tx.Exec("INSERT INTO mailbox_data(ID, Email) values(?,?)", id, compressedBase64)
        if err != nil {
                return "", err
        }

@@ -341,7 +343,12 @@ func GetMessageRaw(id string) ([]byte, error) {
                return nil, errors.New("message not found")
        }

-       raw, err := dbDecoder.DecodeAll([]byte(msg), nil)
+       msgCompressed, err := base64.StdEncoding.DecodeString(msg)
+       if err != nil {
+               return nil, fmt.Errorf("error base64 decoding message: %w", err)
+       }
+
+       raw, err := dbDecoder.DecodeAll(msgCompressed, nil)
        if err != nil {
                return nil, fmt.Errorf("error decompressing message: %s", err.Error())
        }
@@ -155,7 +157,7 @@ func List(start, limit int) ([]MessageSummary, error) {
                Offset(start)

        if err := q.QueryAndClose(nil, db, func(row *sql.Rows) {
-               var created int64
+               var created float64
                var id string
                var messageID string
                var subject string
@@ -176,7 +178,7 @@ func List(start, limit int) ([]MessageSummary, error) {
                        return
                }

-               em.Created = time.UnixMilli(created)
+               em.Created = time.UnixMilli(int64(created))
                em.ID = id
                em.MessageID = messageID
                em.Subject = subject
ulexxander commented 8 months ago

But there are probably a lot more changes required for all Mailpit logic to work normally with rqlite as a storage.

So I got another idea of how to satisfy my use case without need to modify Mailpit: I will get rid of that PersistentVolume for each Mailpit. Instead I will add sidecar container for each Mailpit that will be checking SQLite for changes and will be performing backups to some shared storage. Then on startup (restart) there will be initContainer that will download backup and restore it!

I will share solution if I'll succeed.

axllent commented 8 months ago

Thanks for the info so far @ulexxander, the implementation you demonstrated is already significantly easier than I had imagined. The need to base64-encode the binary e-mail data however is a bummer, as that would greatly increase the storage size of the database (the whole point of zstd compression is to reduce the db storage size).

Anyway, please keep me in the loop with your progress/solution. Not sure if NFS or something is a feasible alternative (or even possible in Kubernetes), although I think the options I use with sqlite aren't friendly with remote filesystems (that could be conditionally toggled with a startup flag though).

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 14 days with no activity.

axllent commented 7 months ago

@ulexxander Just to keep this issue "alive" (so it's not marked as stale) - any progress on your end? I did see your activity in rqlite so it looks like you are still active :+1:

ulexxander commented 7 months ago

Hello @axllent I didn't have yet opportunity to implement it on work, had too much other tasks. But I have plans doing it very soon, anyway this should be quick to do. So I'll report how it goes 😀

Interesting that there was related activity on rqlite, I don't remember mentioning this issue there, but guys there somehow found it and resolved one of the problems. Nice 😀

axllent commented 7 months ago

You're totally right, it wasn't you (I incorrectly assumed it was). I guess the author of rqlite may have seen this discussion and thought it was a great idea to promote that implementation. For what it's worth, I believe rsqlite does support blob too (coming back to the earlier base64 comment). I'm actually pretty excited to see the performance of it (I know it won't be as fast as native/local, but I'm really curious as to how it performs over HTTP), and whether it can (almost) wok as a drop-in replacement!

I am also very busy at the moment with other priorities including work, so haven't had time to look into rsqlite other than a bit of light reading, and besides.... you're doing this piece of work which I really appreciate :-) I've already put in hundreds of hours of work so it's really great to have some help. I have been giving it a lot of thought as to how one could potentially "integrate" it from an end-user perspective without over-complicating the user-options - assuming rsqlite can be integrated as simply a DB driver supporting the same SQL commands, the existing --db-file could be dual-purpose ~if a file path then then use sqlite, else if a URL then rsqlite.

Anyway, I'll leave this with you. Nice to hear you're still working on it.

ulexxander commented 7 months ago

Hello @axllent I implemented my idea - to backup Mailpit's SQLite database to S3 frequently instead of mounting persistent storage into container.

And I am satisfied with it for now, works great.

Sharing it here:

  1. Build and push your Docker image including AWS CLI, gzip and sqlite3 (mine has few more utilities).
FROM ubuntu:jammy
WORKDIR /build

RUN \
  apt-get update -qq && \
  DEBIAN_FRONTEND="noninteractive" \
  apt-get install -qq -y --no-install-recommends \
  ca-certificates \
  curl \
  jq \
  mc \
  sqlite3 \
  unzip \
  vim \
  zip \
  && apt-get clean

RUN \
  curl -o awscli.zip "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-2.15.32.zip" && \
  unzip -q awscli.zip && \
  ./aws/install && \
  rm -rf aws awscli.zip

WORKDIR /app
  1. Create AWS IAM role or user with allowed actions:
  1. Deploy Mailpit with initContainer db-restore that will restore last backup and sidecar container db-backup that will be checking checksum of database and backup it to AWS S3 if changes. Adjust sleep 10 interval for yourself. If you have small Mailpit database like I do, it should be very lightweight. Benchmarked it by doing checksums 10 times per second with 400KB database - only consumed 0.1 CPU time on t3.large instance.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mailpit
  labels:
    app.kubernetes.io/name: mailpit
spec:
  replicas: 1
  strategy:
    # At most one instance should run during upgrades, because of backup mechanism.
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: mailpit
  template:
    metadata:
      labels:
        app.kubernetes.io/name: mailpit
    spec:
      # To authenticate with S3 either use ServiceAccount (if you on AWS EKS)
      # or create access token and load it in environment variables of "db-restore" and "db-backup" containers.
      # For EKS: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
      serviceAccountName: mailpit

      imagePullSecrets:
        - name: your-container-registry

      volumes:
        - name: data
          emptyDir: {}

      # Instead of using PersistentVolume and need to create it for each merge request,
      # we will backup database to S3 frequently if it changes and restore latest backup on redeploy.
      initContainers:
        - name: db-restore
          image: your-registry/aws-cli-db:v1
          imagePullPolicy: IfNotPresent
          workingDir: /data
          command:
            - sh
            - -c
            - |-
              set -e
              aws s3 cp s3://${S3_BUCKET}/${S3_OBJECT} mailpit.db.gz || CP_FAIL=true
              if [ "$CP_FAIL" = "true" ]; then echo "Nothing to restore"; exit 0; fi
              gzip -df mailpit.db.gz
              echo "Restored existing database backup"
          env:
            - name: AWS_DEFAULT_REGION
              value: your-aws-region-1
            - name: S3_BUCKET
              value: my-mailpit
            - name: S3_OBJECT
              value: mailpit.db.gz
          volumeMounts:
            - name: data
              mountPath: /data
          resources:
            requests:
              cpu: 5m
              memory: 16Mi
            limits:
              memory: 128Mi

      containers:
        - name: mailpit
          image: axllent/mailpit:v1.15.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: mailpit-smtp
              containerPort: 1025
            - name: mailpit-ui
              containerPort: 8025
          volumeMounts:
            - name: data
              mountPath: /data
          env:
            - name: MP_DATA_FILE
              value: /data/mailpit.db
          readinessProbe:
            tcpSocket:
              port: mailpit-ui
          resources:
            requests:
              cpu: 5m
              memory: 16Mi
            limits:
              memory: 128Mi

        - name: db-backup
          image: your-registry/aws-cli-db:v1
          imagePullPolicy: IfNotPresent
          workingDir: /data
          command:
            - sh
            - -c
            - |-
              set -e
              trap exit TERM
              echo "Starting continuous backup"
              while true; do
                sleep 10 & wait

                OLD_HASH=$(cat mailpit.db.sum || true)
                NEW_HASH=$(sqlite3 mailpit.db ".sha3sum --schema")
                if [ "$NEW_HASH" = "$OLD_HASH" ]; then continue; fi
                echo "Database file has changed, performing backup"
                echo "Old: $OLD_HASH"
                echo "New: $NEW_HASH"

                sqlite3 mailpit.db ".backup mailpit.db.bak"
                gzip -kf mailpit.db.bak
                aws s3 cp mailpit.db.bak.gz s3://${S3_BUCKET}/${S3_OBJECT}
                echo "$NEW_HASH" > mailpit.db.sum
              done
          env:
            - name: AWS_DEFAULT_REGION
              value: your-aws-region-1
            - name: S3_BUCKET
              value: my-mailpit
            - name: S3_OBJECT
              value: mailpit.db.gz
          volumeMounts:
            - name: data
              mountPath: /data
          resources:
            requests:
              cpu: 5m
              memory: 16Mi
            limits:
              memory: 128Mi
ulexxander commented 7 months ago

Alternative solutions could use NFS (I personally avoid it) or FUSE - userspace filesystems like sshfs and s3fs - but they would require priviliged container.

axllent commented 7 months ago

Thanks for sharing that. Did the idea of using rqlite not work?

axllent commented 7 months ago

@ulexxander I realise you have already found an alternate approach, but I have been playing with the rqlite and I now have a local prototype which is almost fully working. It required a fair number of minor internal changes, eg: int64 -> float64 as that appears to be what rsqlite interprets in many cases via JSON, and an alternate approach for some methods, but it's working as expected.

I hit the same issue as you with the BLOB storage of the compressed raw emails. After quite a bit of hunting and searching I believe I found the "correct" (alternative) way to insert binary data via the sql driver (I'm waiting on confirmation from the rqlite author) which can be used for both the SQLite and rqlite databases, the only difference being that rqlite returns binary BLOB data as a base64-encoded string (which I can easily work around).

DB writes (via SMTP) are significantly slower than the default driver (about 50-75% slower), however even that that is still very reasonable (depending on platform/hardware about 50 messages per second). I don't think that's an issue at all for anyone (unless you're testing mass mail bombs).

I'm not sure if you're still interested (?), but it is looking very promising so far so I thought I'd update you.

ulexxander commented 7 months ago

Hello @axllent. Alternative solution that uses backups works quite good and there were no issues with it. I have chosen it because it was quick to implement.

But definitely solution with proper remote storage would be more elegant and way better.

I think for most, including me, worse performance compared to default driver won't be an issue at all.

Big thanks for doing work on that. I would definitely switch to it once it will be implemented and stable.

axllent commented 6 months ago

@ulexxander I have just merged this new functionality in, but would love it if you could please do some testing before I release anything (if possible)? Currently only the axllent/mailpit:edge docker image contains this feature, and I think it should be fairly easy to integrate whereby all you have to do is set the MP_DATA_FILE=http://<rqlite-server>:<port> (or --db-file http://<rqlite-server>:<port>).

Obviously this solution is highly dependent on the rqlite server running beforehand, and if the connection is broken then bad things can happen (I'm not too sure how the internals of reconnection work, but I assume that if it is a cluster then it potentially moves to the next??).

Anyway, I would love (and really appreciate) your feedback before I release anything!

ulexxander commented 6 months ago

I tested Mailpit with rqlite today on our stage environment and it works fine! Didn't have any issues and didn't notice difference in performance.

I ran a bunch of automated end-to-end tests that use it and they were working just normally 👍

But unfortunately I faced one limitation of rqlite that I did not noticed before - you can not have multiple database (tenants, isolated schemas) on a single instance, like in PostgreSQL / MySQL / MongoDB. My bad...

https://github.com/rqlite/rqlite/issues/927

This is a deal breaker for me, as I would still have to run stateful application with storage for each stage environment deployed on demand...

Unless I implement some wrapper software that will manage / deploy on-demand multiple rqlite instance inside same container attached to the same storage (separate directories for each instance).

otoolep commented 6 months ago

But unfortunately I faced one limitation of rqlite that I did not noticed before - you can not have multiple database (tenants, isolated schemas) on a single instance

See https://github.com/rqlite/rqlite/issues/927#issuecomment-2039733913

otoolep commented 6 months ago

I would still have to run stateful application with storage for each stage environment deployed on demand...

That is not your only option, as explained in the comment above. There are other ways. Happy to explain more if needed.

ulexxander commented 6 months ago

Thanks @otoolep, this is a great suggestion. This level of isolation is totally ok for our use case.

Use a naming scheme, such that tables are prefixed with a tenant ID. E.g. instead of doing SELECT FROM foo you do SELECT FROM $TENANT_foo.

@axllent what do you think about this? Probably introducing tenant name and injecting it into all schema migrations and queries should be relatively simple change? It also could be feature regular SQLite3 users could benefit from.

otoolep commented 6 months ago

The main thing is to provide tenancy information to the right layer in your application, and then let that layer decide how it implements multi-tenancy. Different database technology will offer different ways to then implement the isolation (containers, distinct databases, tables per tenant, etc).

The approach I suggest is a common, though older, pattern. I first used it more than 10 years ago to build multi-tenant SaaS systems. It's still used to this day in some systems, though with the advent of containerization, it's becoming more common to deploy dedicated instances of a database per tenant (though the spin-up time for a new tenant can be high). That way usually maximizes isolation, and helps manage resources (CPU, disk, RAM) on a per-tenant basis.

But using the table-prefix approach, or the extra tenant ID column approach, a single rqlite instance can offer reasonable isolation per tenant. And for multi-tenant systems with large, or unpredicatable numbers of tenants, the approach I suggest is fast with little overhead.

axllent commented 6 months ago

Thank you both your feedback and discussion @ulexxander & @otoolep. This is a major oversight (@ulexxander), and not one that is easily overcome unfortunately without some significant internal changes and potential performance drawbacks. You caught me right at the start of a busy weekend (other personal commitments) so I'll just share my initial thoughts now.

Adding an extra column to all tables isn't a good option as Mailpit requires a specific database migration state (per "client") when they connect. When it detects that the current state is behind it will run the necessary migration tasks, updating the table structure and running any necessary migrations. The moment an older Mailpit "clients" (in times when there have been database changes by a newer client) it would completely shit the bed because the database would be in a future state it doesn't understand. Assuming this could be worked around (I don't think so though), there are also other performance-related concerns I have, but I won't get into those now as I would need more time to consider what these might be.

Adding a table name prefix is currently the other option (as I am gathering). For starters this would require a custom port of the migration module Mailpit uses as there is currently no flexibility to handle this (the table name that maintains the state of the database is hardcoded). Every instance (specifying a "prefix") would have a separate set of tables. This approach should work (I think), but again requires some core changes within Mailpit and the migration code.

I'll keep giving this some thought, and in the meantime if either of you have other ideas and/or suggestions I'm all ears (just a little delayed). Thanks!

axllent commented 6 months ago

I've started work on an optional --db-tenant-id feature which will prefix all tables with a sql-safe prefix of whatever is set. It's early stages but seems to be working as expected. Part of this included replacing the original database migration module which was fortunately easier than I expected, at least for what I need anyway. I still need to do a lot of testing in the coming days, but will let you know @ulexxander when I have something ready for testing.

Still interested in any feedback / thoughts (if either of you have any), but this solution should mean that rqlite can be used from multiple hosts at the same time provided they each specify a unique tenant id. Technically local storage can also use this too, however I can't really see a use-case for that, and I'm also fairly sure there would be locking conflicts if there were two local Mailpit servers running & both accessing the same DB file (SQLite was never designed for that).

axllent commented 6 months ago

@ulexxander I have just pushed a major internal change that adds optional support for a tenant ID. In essence, this prefixes all Mailpit tables with a tenant ID (name) if set. As before, this feature is currently only in the edge Docker build, and I'd really appreciate your feedback & testing, and I'm sure @otoolep would also be keen to hear how it performs with rqlite.

At the moment there is no online documentation as I do not like to add references to features that do not exist (yet), so... you can set this via --db-tenant-id <name> (or env MP_DB_TENANT_ID="<name>"). This should allow seamless integration with rqlite provided each Mailpit client specifies it's own tenant ID.

ulexxander commented 6 months ago

Hello @axllent ! Thanks for implementing that. I tested it today and it works great! But I had to set MP_DB_TENANT, MP_DB_TENANT_ID didn't work - I saw in code you are using the first one.

axllent commented 6 months ago

Oh shit, that's my bad. I couldn't make up my mind whether to refer to it as a "tenant" or a "tenant id" (any thoughts?). Glad to hear it's working for you though!

I'm actually strongly considering changing MP_DATA_FILE to MP_DATABASE (keeping the old one for compatibilities sake of course) as that's more reflective, so the tenant would become either just MP_TENANT or MP_TENANT_ID.

axllent commented 6 months ago

I am pleased to announce v1.16.0 which contains optional integration with rqlite. Thanks to you both @ulexxander for the idea & testing and help, and of course @otoolep for your database & advise!

This release included one new flag/env variable, and one changed one (as I mentioned before):

I am still very keen to hear about performance and how this handles in the field, as I suspect this feature may start gaining a fair bit of traction in time (it's always hard to tell how users are using Mailpit exactly). Recntly the Mailpit docker downloads jumped to constant +- 1.8 million downloads per day (yes, you read that right) - which is slightly higher than the official MySQL docker images! :exploding_head:

otoolep commented 6 months ago

Hey @axllent -- great to see. Thanks to you both for integrating your program with rqlite.

BTW, can you give some pointers on how you generate multi-arch Docker images? I'm getting a lot of requests for the same for rqlite, but the Best Practices for doing so are not clear to me. What do you do?

axllent commented 6 months ago

Yes of course @otoolep - it depends on where you want to build them, ie: yourself or via CI. I have found the easiest way is to build them directly via Github Actions (see this) which also pushes them through to Docker Hub. If you want to discuss it more, please email me (axllent [at] gmail) - I'm just heading off to bed now but will get back to you ASAP.

Edit: It's just worth noting that it is not difficult to build them locally either, but you need to have docker-buildx installed (if you're on Linux, I have no idea about any other platform):

docker buildx build --platform linux/amd64,linux/arm64/v8 --pull -t blaah/bleep --push .

Basically it emulates the "other platforms" (differing from yours), but note that this process is very slow, especially if you are compiling things. All architectures are then combined into one manifest and pushed as one image to wherever you are pushing them to.

otoolep commented 6 months ago

OK, GitHub actions -- good enough. That is one path I'm investigating, so good to know it's what you use. Seems like it's the path I should pursue too.

otoolep commented 5 months ago

Thanks again for your help @axllent -- I got it working today, and now rqlite has multi-platform images.

https://www.philipotoole.com/rqlite-8-24-9-now-with-multi-platform-docker-images/

axllent commented 5 months ago

I saw that @otoolep, nice! You're very welcome.