crooks / PyClean

A Usenet spamfilter written in Python
GNU General Public License v3.0
4 stars 4 forks source link

pyClean fails to filter, logs an error in traceback on FreeBSD 13.1 / INN 2.7.0rc1 / Python 3.8.13 #5

Open jrehmer opened 2 years ago

jrehmer commented 2 years ago

I started to notice that one of my readers wasn't rejecting articles for which I added specific bad_from entries and began looking for the cause to find that pyClean was not filtering when INN 2.7.0rc1 is compiled with Python 3.8 on FreeBSD 13.1. The filter starts up and normal items are logged in pyclean/log/pyclean.log, it runs/logs its maintenance tasks, but no articles are filtered. I notice in pyclean/log/traceback the following error:

Traceback (most recent call last):
  File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
  File "/usr/local/news/bin/filter/filter_innd.py", line 565, in filter
    post['from_email'] = self.addressParse(art['From'])
  File "/usr/local/news/bin/filter/filter_innd.py", line 1005, in addressParse
    name, email = parseaddr(addr)
  File "/usr/local/lib/python3.8/email/utils.py", line 212, in parseaddr
    addrs = _AddressList(addr).addresslist
  File "/usr/local/lib/python3.8/email/_parseaddr.py", line 511, in __init__
    self.addresslist = self.getaddrlist()
  File "/usr/local/lib/python3.8/email/_parseaddr.py", line 255, in getaddrlist
    ad = self.getaddress()
  File "/usr/local/lib/python3.8/email/_parseaddr.py", line 265, in getaddress
    self.gotonext()
  File "/usr/local/lib/python3.8/email/_parseaddr.py", line 238, in gotonext
    if self.field[self.pos] in self.LWS + '\n\r':
TypeError: 'in <string>' requires string as left operand, not int

Not a Python guy and was not able to figure out exactly why, but when I installed the python27 (2.7.18) package and re-compiled INN everything is working as expected. I was able to reproduce this behavior on multiple FreeBSD 13.1 servers.

OS Version and Packages:

root@feed1:/usr/local/news/inn-2.7.0rc1 # uname -mrs
FreeBSD 13.1-RELEASE amd64

python27-2.7.18_1              Interpreted object-oriented programming language
python38-3.8.13                Interpreted object-oriented programming language
yamo-nntp commented 2 years ago

I have same errors on Debian Stable... https://news2web.pasdenom.info/article.php?id=3001&group=fr.comp.usenet.serveurs

Julien-Elie commented 2 years ago

Current version of PyClean is not working with Python 3.x.

I spent these previous days trying to make it work with Python 3.x as it would be very unfortunate that this great PyClean filter hook for INN could no longer be used (as Python 2.x is now deprecated). I believe I succeeded in that. As far as I see, the upgraded version works fine and has not raised any exception for a day. Of course I'll have a look at how it goes on performing, and fix any issue I would see with the changes I made.

I've opened a pull request (#8) with my proposed changes. Feel free to try this new version and report any issue you may encounter.

As an example of fix, the error seen by @yamo-nntp and @jrehmer is solved by changing post['from_email'] = self.addressParse(art['From']) to post['from_email'] = self.addressParse(art['From']).tobytes().decode() as Python 3.x handles Unicode strings totally differently than Python 2.x.

The upgraded PyClean version correctly rejects articles:

2022-09-02 19:21:24 INFO reject: mid=<tessg4$166jn$2@news.trigofacile.com>, reason=Bad From (Julien)

with:

/Julien/ 20250101

in the bad_from configuration file (remember that the syntax is a regular expression followed with an expiration date in anYYYYMMDD syntax).

jrehmer commented 1 year ago

With python 3.9 I get the following in init_traceback (with the latest pull here):

Traceback (most recent call last):
  File "/usr/local/news/bin/filter/filter_innd.py", line 249, in __init__
    self.pyfilter = Filter()
  File "/usr/local/news/bin/filter/filter_innd.py", line 578, in __init__
    self.hourly_events(startup=True)
  File "/usr/local/news/bin/filter/filter_innd.py", line 1145, in hourly_events
    new_regex = self.regex_file(fn)
  File "/usr/local/news/bin/filter/filter_innd.py", line 1242, in regex_file
    return re.compile(regex)
  File "/usr/local/lib/python3.9/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/local/lib/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/local/lib/python3.9/sre_compile.py", line 788, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/local/lib/python3.9/sre_parse.py", line 955, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/local/lib/python3.9/sre_parse.py", line 444, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/local/lib/python3.9/sre_parse.py", line 526, in _parse
    code = _escape(source, this, state)
  File "/usr/local/lib/python3.9/sre_parse.py", line 427, in _escape
    raise source.error("bad escape %s" % escape, len(escape))
re.error: bad escape \I at position 1112

In traceback:

Traceback (most recent call last):
  File "/usr/local/news/bin/filter/filter_innd.py", line 349, in filter_art
    return self.pyfilter.filter(art)
AttributeError: 'InndFilter' object has no attribute 'pyfilter'
Julien-Elie commented 1 year ago

It seems that the regular expression pyClean tries to read from the configuration file is not properly formatted.

File "/usr/local/news/bin/filter/filter_innd.py", line 1242, in regex_file
    return re.compile(regex)

Just before the failure, pyClean normally logs in pyclean.log.2023-... something like INFO Compiled 2 rules from local_hosts which will give a hint to which file was not correctly parsed. Either the contents of the file is wrong or there's a genuine bug in how pyClean parses it. We cannot know without further information. At least the bad_from syntax I tested in my previous message in this thread works. Is it the same you used?

crooks commented 1 year ago

I've slightly restructured the regex_file function to be more tolerant of failed Regular Expression compilations. It should now log the reason for the compilation failure instead of producing a traceback.

jrehmer commented 10 months ago

After fixing the regex issues I have the following errors which seem to be related to having specific values set in ~/pyclean/etc/pyclean.cfg for max_crosspost and lines_allowed:

Traceback (most recent call last):
  File "/usr/local/news/bin/filter/filter_innd.py", line 349, in filter_art
    return self.pyfilter.filter(art)
  File "/usr/local/news/bin/filter/filter_innd.py", line 725, in filter
    if self.groups['count'] > config.get('groups', 'max_crosspost'):
TypeError: '>' not supported between instances of 'int' and 'str'

Traceback (most recent call last):
  File "/usr/local/news/bin/filter/filter_innd.py", line 349, in filter_art
    return self.pyfilter.filter(art)
  File "/usr/local/news/bin/filter/filter_innd.py", line 926, in filter
    isbin = self.binary.isbin(art)
  File "/usr/local/news/bin/filter/filter_innd.py", line 466, in isbin
    if b64match > config.get('binary', 'lines_allowed'):
TypeError: '>' not supported between instances of 'int' and 'str'
Julien-Elie commented 10 months ago

Does it work better with config.getint instead of config.get?

Besides the 2 you mention, there would also be 4 other occurrences to fix:

if suspect > config.get('binary', 'lines_allowed'):`

if self.groups['count'] > config.get('groups',
                                     'max_low_crosspost'):

int(art[__LINES__]) <= config.get('logging',
                                  'logart_maxlines')):

maxlines = config.get('logging', 'logart_maxlines')
jrehmer commented 10 months ago

Thanks Julien, that resolves the error.