akissa / spamc

Python spamassassin spamc client library
GNU Affero General Public License v3.0
6 stars 10 forks source link

RESPONSE_RE is too strict #9

Open duk3luk3 opened 7 years ago

duk3luk3 commented 7 years ago

Hello,

first, thanks for this great library. I'm planning to use it in ISBG (https://github.com/isbg/isbg) and am adapting it for Python 3 right now.

I am trying to use spamc against spamd from spamassassin 3.4.1. When spamd is started without -l / --allow-tell, it can return this response:

SPAMD/1.0 69 Service Unavailable: TELL commands are not enabled, set the --allow-tell switch.

This leads to spamc blowing up because RESPONSE_RE cannot match this response:

Traceback (most recent call last):
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/bin/isbg", line 11, in <module>
    load_entry_point('isbg==1.0', 'console_scripts', 'isbg')()
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/isbg-1.0-py3.6.egg/isbg/isbg.py", line 809, in isbg_run
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/isbg-1.0-py3.6.egg/isbg/isbg.py", line 768, in do_isbg
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/isbg-1.0-py3.6.egg/isbg/isbg.py", line 638, in spamlearn
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/spamc-0.0.8-py3.6.egg/spamc/client.py", line 321, in learn
    resp = self.tell(msg, 'learn', learnas)
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/spamc-0.0.8-py3.6.egg/spamc/client.py", line 312, in tell
    return self.perform('TELL', msg, headers)
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/spamc-0.0.8-py3.6.egg/spamc/client.py", line 229, in perform
    return get_response(cmd, conn)
  File "/mnt/nfs/erlacher/scratch/venvs/isbg/lib/python3.6/site-packages/spamc-0.0.8-py3.6.egg/spamc/client.py", line 76, in get_response
    'spamd unrecognized response: %s' % data)
spamc.exceptions.SpamCResponseError: spamd unrecognized response: SPAMD/1.0 69 Service Unavailable: TELL commands are not enabled, set the --allow-tell switch.

Shouldn't the RESPONSE_RE be a lot more liberal with the "message" group, e.g. just allowing anything in it:

RESPONSE_RE = re.compile(r'^SPAMD/(?:[0-9\.]+)\s(?P<code>[0-9]+)'
                         r'\s(?P<message>.+)$')

Of course it would be good to have some kind of restriction of the charset, but since spamd doesn't really specify anything...

akissa commented 7 years ago

Hi

Technically that is not a TELL response thus the exception is generated. You should catch those exceptions in your code. Even if we made response more liberal we would still have to issue an exception since it is not a real TELL response that was returned.

duk3luk3 commented 7 years ago

It's definitely a TELL response because it's the response to a completely valid TELL request. It's an entirely expected part of the spamd protocol.

If spamc doesn't handle this, that in effect means that you've decided to only handle a subset of spamd's protocol.

What does spamc gain from not handling that response? There's no additional code in spamc required, the consuming code gets a perfectly well-formed result from spamc:

{'code': 69, 'message': 'Service Unavailable: TELL commands are not enabled, set the --allow-tell switch.', 'isspam': False, 'score': 0.0, 'basescore': 0.0, 'report': [], 'symbols': [], 'headers': {}, 'didset': False, 'didremove': False}
akissa commented 7 years ago

Okay, will look into that

duk3luk3 commented 7 years ago

Thank you! If you give me a few days I will have a PR ready (or two).

akissa commented 7 years ago

Great, the tests will need to be updated as well to cover this

duk3luk3 commented 7 years ago

OK, how do I run the tests? Do they mock spamd?

akissa commented 7 years ago

python setup.py nosetests

Yes, they mock spamd, that will have to be extended to generate the above response.

duk3luk3 commented 7 years ago

Great, I'll see if I can figure it out!

duk3luk3 commented 7 years ago

What is the minimum version of python you want to support?

akissa commented 7 years ago

2.7

duk3luk3 commented 7 years ago

I am confused about the do_HEADERS function at https://github.com/akissa/spamc/blob/master/tests/_s.py#L80.

It assembles the headers list, but then doesn't do anything with it. Why is that?

EDIT: So this results in just echoing the received headers back out. Couldn't do_HEADERS then just be:

    def do_HEADERS(self):
        """Emulate HEADERS"""
        content_length = int(self.headers.get('Content-length', 0))
        body = self.rfile.read(content_length)
        parts, = body.split(b'\r\n\r\n')
        self.wfile.write(b"SPAMD/1.5 0 EX_OK\r\n")
        self.wfile.write(b"Spam: True ; 15 / 5\r\n")
        if self.request_version >= (1, 3):
            self.wfile.write(b"Content-length: %d\r\n" % len(parts))
        self.wfile.write(b"\r\n\r\n")
        self.wfile.write(parts)
        self.close_connection = 1
akissa commented 7 years ago

You need to look at the documentation for StreamRequestHandler to see how that gets handled

duk3luk3 commented 7 years ago

I don't think this has anything to do with StreamRequestHandler, it's purely the implementation of the spamd HEADERS command that seems very weird to me in this function - making it hard for me to make it python3-compatible...

akissa commented 7 years ago

Okay maybe i misunderstood you, the code you are pointing to is not the implementation it is a test. Please clarify what the issue with compatibility is.

duk3luk3 commented 7 years ago

Well, I'd call it the mock implementation of spamd for tests to use.

The issue with compatibility is simply that I can't properly port code I don't understand.

It seems to me that maybe the (dead) header assembling code is left over from an old (erroneous) implementation that echoed the request headers back to the client instead of the headers of the content. That was fixed but for some reason the old code stuck around.

akissa commented 7 years ago

Have not had my coffee today, mixing up my trend of thoughts here. Yes the top part of the do_HEADERS needs to be commented out.

akissa commented 7 years ago

Removed in 24a80a35506e379061e5e6947ab2e57ef283b15d

duk3luk3 commented 7 years ago

Thank you for your continued help here. :-)

I'm also a little confused about parts, = body.split('\r\n\r\n') - is that meant to just strip \r\n\r\n from the end of body?

akissa commented 7 years ago

It is meant to extract the headers (Server) - https://github.com/apache/spamassassin/blob/trunk/spamd/PROTOCOL#L32

akissa commented 7 years ago

Link for client behaviour is https://github.com/apache/spamassassin/blob/trunk/spamd/PROTOCOL#L42

duk3luk3 commented 7 years ago

You linked to a description of the spamd protocol, but body is actually the embedded email - which doesn't have \r\n in it at all (It uses just \n newlines - https://github.com/akissa/spamc/blob/master/examples/sample-spam.txt - and those aren't converted to \r\n anywhere). It's just terminated with \r\n\r\n by spamc. (It's interesting by itself that the self.rfile.read(content_length) actually hits that terminating \r\n\r\n...)

If you add

        print(repr(body))                                                                                                                                                                                                                                    
        print('---')                                                                                                                                                                                                                                         
        print(repr(parts))                                                                                                                                                                                                                                   

after

        parts, = body.split('\r\n\r\n')                                                                                                                                                                                                                      

and run python setup.py nosetests -s you will see that parts is the whole test mail body...

akissa commented 7 years ago

You are right once again, am totally rusty with this. That code on inspection is supposed to calculate what we send back to the client in the Content-length header. I cannot remember why i implemented it that way as opposed to how it is done in do_PROCESS.

When i get sometime will have to read through the code again.

duk3luk3 commented 7 years ago

Now that my confusion is cleared up I think I'll be able to get you a very nice clean new implementation :-)

akissa commented 7 years ago

P.S you can test aganist an actual spamd server by setting the SPAMD_HOST and SPAMD_PORT environment variables.