SpamExperts / pyzor

Pyzor is a Python implementation of a spam-blocking networked system that use spam signatures to identify them.
GNU General Public License v2.0
139 stars 31 forks source link

Recognize GTUBE string as test spam #147

Open sidney opened 1 year ago

sidney commented 1 year ago

This is an enhancement request, that I have a PR for.

As a developer of Apache SpamAssassin, which has a plugin for using pyzor, I have a problem with our functional test for the plugin, which is supposed to submit a test spam message to pyzor check and verify that it comes back with a positive result. The problem is that if we use a real-world spam, it gets expired from the server after a few months. I don't think that it would be right for us to pollute the server with an arbitrary email that is not really currently spam in the wild just to run a test.

GTUBE is a standard test string for any spam detection/filter software to use to provide a convenient way to test installations. See https://spamassassin.apache.org/gtube/ for details. The standard says that any email that contains that string within it is treated as a spam by the spam filtering software.

Because the GTUBE string is 68 characters without whitespace, pyzor's digest algorithm currently filters it out, so it is not even possible to report a GTUBE test spam email in a meaningful way.

The enhancement in the PR I'll submit is to digest.py to have it detect the 68 character GTUBE string and short-circuit the hash calculation to make the pre-digest result and the hash to be of just that one string. This causes any message containing the GTUBE string to have the same unique hash.

The other part of the PR is in the server-side check function, to make it detect the GTUBE hash in a request and bypass the database lookup to return the same maximum/0 result that pong does. It would be ideal if servers updated to the latest version of the software, but any that don't will still work with this scheme as long as GTUBE test emails are reported to them at least every few months, giving the client-side change backward compatibility with legacy servers.