Mailu / helm-charts

Development repo for helm charts
126 stars 130 forks source link

Add StatefulSet deployment for postfix #304

Open Deltachaos opened 1 year ago

Deltachaos commented 1 year ago

We can run multiple postfix instances in a StatefulSet to increase HA

fastlorenzo commented 1 year ago

Thank you for your PR, we've discussed HA in length within the matrix chat of Mailu, the only issue that still exists is in order to achieve full HA, you'd need to be able to have true HA with Dovecot, which is only supported officially in the paid version for now. I'll have a look at this PR though, but you should keep in mind that limitation

Deltachaos commented 1 year ago

Even if there is no full HA, partial HA is better then nothing

github-actions[bot] commented 11 months ago

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Sacerdoss commented 11 months ago

Would love to have the option. Can`t dovecot ha be achieved with ReadWriteMany PVC? In a short test other than the postfix deployment dovecot scales up without problems and all pods are healthy.

Sacerdoss commented 11 months ago

I just manually created a Statefulset for additional Postfix instances now within the same namespace. And it works like expected. We scaled up the front deployment and made sure the new Statefulset pods match with the service selector for the mailu-postfix service so the loadbalancing works. Having no problems at all with dovecot replicas just manually updated to 3. HA Setup seems to work with this workaround.

Sacerdoss commented 11 months ago

@Sacerdoss it doesn't and you just haven't realized it yet.

dovecot will corrupt its indexes, at best you will get abysmal performance since you will be load-balanced in between back-ends.

I do see the problem. It probably just didn`t come up in my small test setup.

postfix won't work properly either; the mail queue is not meant to be replicated; If you do not replicate it you won't get HA (loss of one node/pod will loose you mails)

With a StatefulSet the queue wont be replicated as each replica receives its own pvc with its own queue directory.

But isn`t the point of a ha setup here to just bridge the downtime until the replica comes up again. For this usecase you wont lose mails with seperated queues. You just have the risk of some delayed ones which get delivered once the replica comes up again.

At least for us that would be more than good enough.

admin has shared state that just won't be on other instances

... we could make it work but we are not there yet.

I did not look at that service yet. But thank you very much for your clarification.

nextgens commented 11 months ago

I have clarified on https://github.com/Mailu/helm-charts/pull/303#issuecomment-1779578714 what you can "scale up" today with 2.0

With master and some work on admin to "stear" clients from front to the same dovecot backends (like dovecot director would) #304 should be possible.

The code is at https://github.com/Mailu/Mailu/blob/master/core/admin/mailu/internal/nginx.py#L140 ... for a given username it should always redirect to the same backend (unless that backend is gone in which case it should pick another). It's actually relatively straightforward to do: check redis, if there is something and the backend is still live (one of the entries still in DNS) use that otherwise pick a backend randomly from DNS and update the cache in redis.

github-actions[bot] commented 10 months ago

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Deltachaos commented 1 month ago

@nextgens, you mentioned Dovecot in your comment, but this PR is still about Postfix. The Postfix service can be clustered without any issues when persistent volumes are not shared, as these volumes function only as a spool for Postfix. Emails are randomly delivered to one of the running Postfix services, and each service continues delivering the emails to their targets.

nextgens commented 1 month ago

This will still wreck rate limiting, both inbound and outbound... You may not care about it though.

Yes now that what was master has been released it's safer and will mostly work and does add some level of HA provided you also scale front.

github-actions[bot] commented 2 days ago

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.