NethServer / dev

NethServer issue tracker
https://github.com/NethServer/dev/issues
63 stars 18 forks source link

POP3 connector discards mail during ClamAV reloads #6052

Closed stephdl closed 4 years ago

stephdl commented 4 years ago

If a message is retrieved by getmail during ClamAV signatures DB reload the message is discarded.

Steps to reproduce

Expected behavior

I expect that the message is not discarded. If ClamAV is not responsible getmail must behave like a real MTA and try again later.

Actual behavior

When getmail fetches a message a 8s timeout occurs and the email is soft rejected by rspamd.

The email is discarded by the before.sieve rules.

This is the /var/log/maillog evidence:

Feb  6 10:30:09 vm5 rspamd[32273]: <4bac8a>; csession; rspamd_task_write_log: id: <20191211152337.2E5AF1804DF6F@prometheus.de-labrusse.fr>, ip: 127.0.0.1, (default: F (soft reject): [1.00/20.00] [DATE_IN_PAST(1.00){},MIME_GOOD(-0.10){text/plain;message/rfc822;},RCVD_NO_TLS_LAST(0.10){},FROM_NEQ_ENVFROM(0.00){MAILER-DAEMON@prometheus.de-labrusse.fr;;},FROM_NO_DN(0.00){},MID_RHS_MATCH_FROM(0.00){},MIME_TRACE(0.00){0:~;1:+;2:~;3:+;4:+;},RCPT_COUNT_ONE(0.00){1;},RCVD_COUNT_THREE(0.00){4;},RCVD_VIA_SMTP_AUTH(0.00){},TO_DN_NONE(0.00){}]), len: 6204, time: 8009.044ms, dns req: 24, digest: <062c9020d3a1fc8dbfa356027d207961>, mime_rcpts: <davidep2@nethserver.org>, file: stdin, forced: soft reject "timeout processing message"; score=nan (set by task timeout)
Feb  6 10:30:09 vm5 rspamd[32273]: <4bac8a>; csession; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 2 regexps matched, 184 regexps total, 94 regexps cached, 0B scanned using pcre, 7.05KiB scanned total
Feb  6 10:30:10 vm5 dovecot: lda(first.user@dpnet.nethesis.it): sieve: msgid=<20191211152337.2E5AF1804DF6F@prometheus.de-labrusse.fr>: marked message to be discarded if not explicitly delivered (discard action)
Feb  6 10:30:10 vm5 getmail: msg 13/13 (6311 bytes) msgid 1399300768/140 from <> delivered to MDA_external command dovecot-lda ()

Components

nethserver-mail-server-2.11.2-1.ns7.noarch nethserver-mail-smarthost-2.11.2-1.ns7.noarch nethserver-mail-common-2.11.2-1.ns7.noarch nethserver-mail-filter-2.11.2-1.ns7.noarch nethserver-mail-getmail-2.11.2-1.ns7.noarch

See also


thank wayne Bilger

DavidePrincipi commented 4 years ago

I'd change the expectation. Timeouts can happen, it's a normal. The issue is in the way soft rejection is handled. Here:

https://github.com/NethServer/nethserver-mail/blob/792018c14507e0924d1a3704568540dd9f4c13da/getmail/etc/e-smith/templates/var/lib/nethserver/sieve-scripts/before.sieve/30getmailFilter#L7

This is the output of rspamc.

And finally this is where we have to catch the "soft reject" and return an exit code to say to getmail "try again later".

https://github.com/NethServer/nethserver-mail/blob/792018c14507e0924d1a3704568540dd9f4c13da/getmail/usr/bin/rspamc-getmail#L24

We probably need to fix rspamc-p3scan as well:

https://github.com/NethServer/nethserver-mail/blob/792018c14507e0924d1a3704568540dd9f4c13da/p3scan/usr/bin/rspamc-p3scan#L53

stephdl commented 4 years ago

I'd change the expectation. Timeouts can happen, it's a normal. The issue is in the way soft rejection is handled.

Timeout can occur obviously but I wonder if a soft reject on the task_timout is not a pandora box, I would be interested on what it happens if the task_timeout is not soft rejected, probably rejected ?

DavidePrincipi commented 4 years ago

What task are we talking about? In this case the "task" is running the AV check. If it does not come to a verdict the best thing to do is retrying. I want to be conservative and wait until the AV is working correctly again.

I can't figure out what other "external" checks may run. Remote sandbox? Remote message fingerprints? There can be many. In general if a check does not finish in a short time the message has not to be delivered, neither discarded.

stephdl commented 4 years ago

this is the commit which introduced the task_timeout with soft reject, it is the internal task of rspamd, I bet that clamav and rspamc get their relevant timeout

https://github.com/rspamd/rspamd/commit/d3d19ad641185f0fdb380705cb999e51dbbb5e50#diff-ef47418ec6cd0b81677d6203f21738b6R320

If your solution is workable, go on it

DavidePrincipi commented 4 years ago

Additional information

With nethserver-mail-filter-2.11.2-1.ns7.noarch (soft_reject_on_timeout = true) and a freezed clamd process this is what happens.

Using a SMTP client (curl):

Running the rspamc client (like getmail)


If I set soft_reject_on_timeout = false and clamd is still freezed:

Using SMTP it's the same as soft_reject_on_timeout = true:

Using rspamc the header changes:


In conclusion:

DavidePrincipi commented 4 years ago

There must be some rspamd internals that set the timeout clocks differently, depending on the client rspamc vs SMTP.

If I set task_timeout = 20, that is more than the antivirus 3x5s default timeout

With SMTP

With rspamc:

X-Spam-Action: soft reject
X-Spam-Scan-Time: 15.045
nethbot commented 4 years ago

in 7.7.1908/testing:

DavidePrincipi commented 4 years ago

Test cases

Check the bug is not reproducible. As long as clamd is blocked by SIGSTOP the messages must be left on the server and never expunged.

Additional checks

While clamd is blocked by SIGSTOP:

Send SIGCONT to clamd and verify the messages are delivered correctly from both SMTP and getmail:

  1. non-spam messages are delivered to INBOX
  2. spam messages are delivered to Junk

Ultimate tests

Other (rare) use cases than must return temporary failure (soft reject):

  1. The clamd service is completely stopped
  2. The rspamd service is blocked by SIGSTOP (its socket is still reachable from Postfix)
  3. The rspamd service is completely stopped

Finally, test the fix works also with p3scan


Useful commands

  1. clear local rspamd AV verdicts cache and bayes DB (do not run in production!) redis-cli -s /var/run/redis-rspamd/rspamd FLUSHALL
  2. Block the clamd process kill -STOP <pidofclamd>
  3. Unblock the clamd process kill -CONT <pidofclamd>
stephdl commented 4 years ago

QA

Verified with getMail

DavidePrincipi commented 4 years ago

Finally, test the fix works also with p3scan

The bug is not reproducible with p3scan, because the AV check runs separately with clamdscan. When clamd reloads the DB, p3scan simply waits until it finishes. Rspamd is contacted after clamd: we are almost sure clamd works correctly at that time.

nethbot commented 4 years ago

in 7.7.1908/testing:

nethbot commented 4 years ago

in 7.7.1908/testing:

nethbot commented 4 years ago

in 7.7.1908/updates: