deltachat / chatmail

chatmail service deployment scripts and docs
https://delta.chat/en/2023-12-13-chatmail
MIT License
97 stars 5 forks source link

Temporary 4.4.2 error during SMTP sending #315

Open link2xt opened 1 month ago

link2xt commented 1 month ago

I have noticed that deltachat-core-rust CI fails on test_qr_readreceipt frequently.

One of the logs:

DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'ConnectivityChanged'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:388: SMTP failed to send: Transient(Response { code: Code { severity: TransientNegativeCompletion, category: Connections, detail: One }, message: ["4.4.2 *** Error: timeout exceeded"] })'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:442: Transient error with status code 4.4.2, postponing retry for later'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:467: Failed to send message over SMTP, disconnecting'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:783: Temporary SMTP failure while sending an MDN for Msg#21'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/scheduler.rs:787: smtp fake idle - started'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'ConnectivityChanged'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/scheduler.rs:815: smtp has no messages to retry, waiting for interrupt'}
DEBUG    root:rpc.py:180 account_id=2 got an event {'kind': 'ConnectivityChanged'}
DEBUG    root:rpc.py:180 account_id=1 got an event {'kind': 'Info', 'msg': 'src/imap.rs:1380: Passing message UID 15 to receive_imf().'}
DEBUG    root:rpc.py:180 account_id=1 got an event {'kind': 'Info', 'msg': 'src/receive_imf.rs:223: Receiving message "Mr.-Z5a6e3Kv0i.MInvyWTN

Another log:

DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/message.rs:1670: Seen message Msg#21.'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'chatId': 14, 'kind': 'MsgsNoticed'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'chatId': 14, 'kind': 'ChatlistItemChanged'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/scheduler.rs:819: smtp fake idle - interrupted'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:696: Selected rows from SMTP queue: [].'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:801: Sending MDNs'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/e2ee.rs:66: Peerstate for "ci-axrk5n@***" is mutual.'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'ConnectivityChanged'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:388: SMTP failed to send: Transient(Response { code: Code { severity: TransientNegativeCompletion, category: Connections, detail: One }, message: ["4.4.2 *** Error: timeout exceeded"] })'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:442: Transient error with status code 4.4.2, postponing retry for later'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:467: Failed to send message over SMTP, disconnecting'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/smtp.rs:783: Temporary SMTP failure while sending an MDN for Msg#21'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/scheduler.rs:787: smtp fake idle - started'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'ConnectivityChanged'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'Info', 'msg': 'src/scheduler.rs:815: smtp has no messages to retry, waiting for interrupt'}
DEBUG    root:rpc.py:180 account_id=2 got an event {'kind': 'Info', 'msg': 'src/imap.rs:747: 0 mails read from "INBOX".'}
DEBUG    root:rpc.py:180 account_id=1 got an event {'kind': 'Info', 'msg': 'src/imap/idle.rs:60: INBOX: Idle entering wait-on-remote state'}
DEBUG    root:rpc.py:180 account_id=3 got an event {'kind': 'ConnectivityChanged'}

Postfix documentation says timeout changes to 10 seconds "under overload": https://www.postfix.org/postconf.5.html#smtpd_timeout

Looking at journalctl -t postfix/smtps/smtpd, I see a lot of messages like this:

May 16 18:45:12 nine postfix/smtps/smtpd[2261768]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-fzmd83@nine.testrun.org> to=<ci-fzmd83@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-fzmd83@nine.testrun.org>
May 16 18:45:12 nine postfix/smtps/smtpd[2260606]: connect from unknown[13.105.117.158]
May 16 18:45:12 nine postfix/smtps/smtpd[2263756]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-eahqgf@nine.testrun.org> to=<ci-fzmd83@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-eahqgf@nine.testrun.org>
May 16 18:45:12 nine postfix/smtps/smtpd[2261067]: timeout after END-OF-MESSAGE from unknown[20.55.214.161]
May 16 18:45:12 nine postfix/smtps/smtpd[2261067]: disconnect from unknown[20.55.214.161] ehlo=1 auth=1 mail=4 rcpt=4 data=4 commands=14
May 16 18:45:12 nine postfix/smtps/smtpd[2260505]: disconnect from unknown[13.105.117.158] ehlo=1 auth=1 quit=1 commands=3
May 16 18:45:12 nine postfix/smtps/smtpd[2260660]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-sh4mt9@nine.testrun.org> to=<ci-uk3pqg@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-sh4mt9@nine.testrun.org>
May 16 18:45:12 nine postfix/smtps/smtpd[2260543]: disconnect from unknown[13.105.117.158] ehlo=1 auth=1 quit=1 commands=3
May 16 18:45:12 nine postfix/smtps/smtpd[2260562]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-etgxhz@nine.testrun.org> to=<ci-9qya98@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-etgxhz@nine.testrun.org>
May 16 18:45:12 nine postfix/smtps/smtpd[2260493]: disconnect from unknown[20.172.5.33] ehlo=1 auth=1 quit=1 commands=3
May 16 18:45:12 nine postfix/smtps/smtpd[2262495]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-nh4nna@nine.testrun.org> to=<ci-vxjzc9@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-nh4nna@nine.testrun.org>
May 16 18:45:12 nine postfix/smtps/smtpd[2261473]: connect from unknown[13.105.117.179]
May 16 18:45:12 nine postfix/smtps/smtpd[2261130]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-r55nsv@nine.testrun.org> to=<ci-r55nsv@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-r55nsv@nine.testrun.org>
May 16 18:45:12 nine postfix/smtps/smtpd[2260609]: NOQUEUE: client=unknown[172.183.98.99], sasl_method=PLAIN, sasl_username=ci-vg9tnv@nine.testrun.org
May 16 18:45:12 nine postfix/smtps/smtpd[2261083]: NOQUEUE: client=unknown[20.55.15.32], sasl_method=PLAIN, sasl_username=ci-du7a5v@nine.testrun.org
May 16 18:45:12 nine postfix/smtps/smtpd[2261916]: NOQUEUE: client=unknown[20.55.15.32], sasl_method=PLAIN, sasl_username=ci-mzssj3@nine.testrun.org
May 16 18:45:12 nine postfix/smtps/smtpd[2262160]: NOQUEUE: client=unknown[20.55.15.32], sasl_method=PLAIN, sasl_username=ci-zh4kga@nine.testrun.org
May 16 18:45:12 nine postfix/smtps/smtpd[2261507]: NOQUEUE: client=unknown[13.105.117.158], sasl_method=PLAIN, sasl_username=ci-hjz9aq@nine.testrun.org
May 16 18:45:12 nine postfix/smtps/smtpd[2261907]: connect from unknown[20.172.5.33]
May 16 18:45:12 nine postfix/smtps/smtpd[2261259]: disconnect from unknown[172.183.147.196] ehlo=1 auth=1 mail=1 rcpt=1 data=1 commands=5
May 16 18:45:12 nine postfix/smtps/smtpd[2261365]: connect from unknown[13.105.117.179]
May 16 18:45:12 nine postfix/smtps/smtpd[2263661]: NOQUEUE: client=unknown[172.183.147.196], sasl_method=PLAIN, sasl_username=ci-wke7zu@nine.testrun.org
May 16 18:45:12 nine postfix/smtps/smtpd[2260655]: connect from unknown[20.55.214.161]
May 16 18:45:12 nine postfix/smtps/smtpd[2260993]: proxy-accept: END-OF-MESSAGE: 250 OK; from=<ci-etpyj9@nine.testrun.org> to=<ci-rahy8r@nine.testrun.org> proto=ESMTP helo=<[127.0.0.1]> sasl_username=<ci-etpyj9@nine.testrun.org>

It seems the server rejects some messages when CI opens a lot of accounts and sends many messages. I will also open a core issue because the core should retry MDN sending instead of logging smtp has no messages to retry, waiting for interrupt and sleeping forever.