deltachat / chatmail

chatmail service deployment scripts and docs
https://delta.chat/en/2023-12-13-chatmail
MIT License
97 stars 5 forks source link

Authentication from Postfix to Dovecot is flaky #273

Open link2xt opened 2 months ago

link2xt commented 2 months ago

Delta chat core CI sometimes fails to create an account with "transient: 4.7.0 Temporary authentication failure: Connection lost to authentication server" error when running Python tests.

In Postfix logs I found this:

Apr 25 11:39:24 nine postfix/smtps/smtpd[986002]: warning: unknown[52.159.136.144]: SASL PLAIN authentication failed: (reason unavailable), sasl_username=$ci-jv34ug@nine.testrun.org
Apr 25 11:39:24 nine postfix/smtps/smtpd[986002]: too many errors after AUTH from unknown[52.159.136.144]
Apr 25 11:39:24 nine postfix/smtps/smtpd[986002]: disconnect from unknown[52.159.136.144] ehlo=1 auth=0/1 commands=1/2

So it seems Dovecot sometimes get overloaded by authentication requests from Postfix. On Dovecot side I don't see anything related in the log, so it looks like socket queue got overloaded and connection failed on the kernel level without reaching dovecot process.

If possible we should increase queue size on dovecot socket or make authentication processing faster somehow.

link2xt commented 1 month ago

Just happened again: https://github.com/deltachat/deltachat-core-rust/actions/runs/9135047671/job/25121780037?pr=5592

link2xt commented 2 weeks ago

As for the next step to make it somehow actionable, would be nice to setup mtail or grok_exporter to convert these failures into metrics and have a dashboard showing the rate of these errors without the need to look into the logs.