Stop backscatter from Alaveteli

olineham commented 6 years ago

The default behaviour of Alaveteli is to send fake bounces to closed requests. (Fake because Alaveteli has no way to know the true sender.) i.e. backscatter. This is unacceptable to us at it ruins our mail reputation and hence overall deliverability (as well as being bad internet citizens).

In the short term this has been eliminated by updating all requests in the database from bounce (the column default) to holding_pen at the expense of a large increase of requests to deal with in the holding pen.

There are a few ways we could deal with this in the long term, to be discussed below.

See also upstream issue https://github.com/mysociety/alaveteli/issues/217

olineham commented 6 years ago

I think it's worth outlining high level requirements. Feel free to edit this. See also comment.

R1. We must not backscatter

R2. Real responses must never be lost silently

They must be:

1. accepted and filed with the real request, or,
1. rejected and communicated effectively to the real sender, or
1. accepted and filed with the holding_pen.

R3. The holding pen volume must be manageable.

Sending everything to holding_pen instead of fake bouncing is not manageable.

Note: Invalid recipient aliases (e.g. no such request ID or invalid hash) may be routed to holding_pen (like today) or rejected by the MTA, but this is outside the scope of this issue as we never did fake-bounce these. See #9. Most of these are spammers who mis-OCR'd a PDF but significant number are typos by authorities.

olineham commented 6 years ago

Sources of spam and anti-spam measures

Since approx 2016, spammers started more frequently harvesting email addresses from image PDFs using OCR. This is now the majority of our spam.

To a lesser extent, spammers have got hold of our addresses through compromise of authority systems. Prior to about 2016 this was the more common source of our spam and thus it was infrequent. Holding pen was an acceptable approach then.

Aging of requests to accept responses from authority_only (old months old) or nobody (very_old months old) is an anti-spam measure. We should not assume we need to keep this aging approach as is.
"Authority only" was an anti-spam measure. It's not a very good one, as it's now pretty common to find responses come from a different domain than the authority address (whether due to transfer, or multiple domains used by the authority). We should not assume we need to keep this "authority only" approach as is.

We should consider if completely different approaches to anti-spam might be effective. For example, DNSBLs to reject the worst mail at SMTP time, Spamassassin to set scores to help decide on routing to real requests or holding pen.

olineham commented 6 years ago

Regarding R2, rarely but importantly, sometimes a very old request receives a real response after eventual intervention by the Ombudsman. It's important these responses either end up in holding_pen or have an effective bounce message to the sender.

Regarding R2 (ii): I'm not sure how we can send a full and readable explanation of the situation at SMTP time. At best we'd be able to send something in a 5.5.4 response, but would this communicate clearly enough to the authority that they need to contact us to ask for the address to be re-opened?

nigeljonez commented 5 years ago

Assigning this, target will be get a proper solution running when we update our instance.

Of note:

mysociety/alaveteli@fedd73a72c implemented rake tasks for updating requests with MTA rejection policies, this may be an alternative to patching Postfix into a database view (although with our current infrastructure means we can't easily do this at the moment via docker volumes).
Getting a solution to mysociety/alaveteli#217 should reduce the remaining bounces.

fyiorgnz / alaveteli