Closed stephdl closed 4 years ago
I'd change the expectation. Timeouts can happen, it's a normal. The issue is in the way soft rejection is handled. Here:
This is the output of rspamc.
And finally this is where we have to catch the "soft reject" and return an exit code to say to getmail "try again later".
We probably need to fix rspamc-p3scan as well:
I'd change the expectation. Timeouts can happen, it's a normal. The issue is in the way soft rejection is handled.
Timeout can occur obviously but I wonder if a soft reject on the task_timout is not a pandora box, I would be interested on what it happens if the task_timeout is not soft rejected, probably rejected ?
What task are we talking about? In this case the "task" is running the AV check. If it does not come to a verdict the best thing to do is retrying. I want to be conservative and wait until the AV is working correctly again.
I can't figure out what other "external" checks may run. Remote sandbox? Remote message fingerprints? There can be many. In general if a check does not finish in a short time the message has not to be delivered, neither discarded.
this is the commit which introduced the task_timeout with soft reject, it is the internal task of rspamd, I bet that clamav and rspamc get their relevant timeout
If your solution is workable, go on it
Additional information
With nethserver-mail-filter-2.11.2-1.ns7.noarch (soft_reject_on_timeout = true
) and a freezed clamd process this is what happens.
Using a SMTP client (curl):
451 4.7.1 Cannot validate the message now. Try again later
Running the rspamc client (like getmail)
X-Spam-Action: soft reject
If I set soft_reject_on_timeout = false
and clamd is still freezed:
Using SMTP it's the same as soft_reject_on_timeout = true
:
Using rspamc the header changes:
X-Spam-Action: add header
In conclusion:
soft_reject_on_timeout = false
does not give enough information to rspamc. There's no way for getmail to know the AV check was skipped due to a timeout. Instead soft_reject_on_timeout = true
gives back a "soft reject" mail header providing enough context to abort the delivery and try again later.There must be some rspamd internals that set the timeout clocks differently, depending on the client rspamc vs SMTP.
If I set task_timeout = 20
, that is more than the antivirus 3x5s default timeout
With SMTP
451 4.7.1 Cannot validate the message now. Try again later
With rspamc:
X-Spam-Action: soft reject
X-Spam-Scan-Time: 15.045
in 7.7.1908/testing
:
Test cases
Check the bug is not reproducible. As long as clamd is blocked by SIGSTOP the messages must be left on the server and never expunged.
Additional checks
While clamd is blocked by SIGSTOP:
Send SIGCONT to clamd and verify the messages are delivered correctly from both SMTP and getmail:
Ultimate tests
Other (rare) use cases than must return temporary failure (soft reject):
Finally, test the fix works also with p3scan
Useful commands
redis-cli -s /var/run/redis-rspamd/rspamd FLUSHALL
kill -STOP <pidofclamd>
kill -CONT <pidofclamd>
QA
Verified with getMail
Finally, test the fix works also with p3scan
The bug is not reproducible with p3scan, because the AV check runs separately with clamdscan
. When clamd reloads the DB, p3scan simply waits until it finishes. Rspamd is contacted after clamd: we are almost sure clamd works correctly at that time.
in 7.7.1908/testing
:
in 7.7.1908/testing
:
in 7.7.1908/updates
:
If a message is retrieved by getmail during ClamAV signatures DB reload the message is discarded.
Steps to reproduce
Expected behavior
I expect that the message is not discarded. If ClamAV is not responsible getmail must behave like a real MTA and try again later.
Actual behavior
When getmail fetches a message a 8s timeout occurs and the email is soft rejected by rspamd.
The email is discarded by the
before.sieve
rules.This is the
/var/log/maillog
evidence:Components
nethserver-mail-server-2.11.2-1.ns7.noarch nethserver-mail-smarthost-2.11.2-1.ns7.noarch nethserver-mail-common-2.11.2-1.ns7.noarch nethserver-mail-filter-2.11.2-1.ns7.noarch nethserver-mail-getmail-2.11.2-1.ns7.noarch
See also
thank wayne Bilger