Open jrehmer opened 2 years ago
I have same errors on Debian Stable... https://news2web.pasdenom.info/article.php?id=3001&group=fr.comp.usenet.serveurs
Current version of PyClean is not working with Python 3.x.
I spent these previous days trying to make it work with Python 3.x as it would be very unfortunate that this great PyClean filter hook for INN could no longer be used (as Python 2.x is now deprecated). I believe I succeeded in that. As far as I see, the upgraded version works fine and has not raised any exception for a day. Of course I'll have a look at how it goes on performing, and fix any issue I would see with the changes I made.
I've opened a pull request (#8) with my proposed changes. Feel free to try this new version and report any issue you may encounter.
As an example of fix, the error seen by @yamo-nntp and @jrehmer is solved by changing post['from_email'] = self.addressParse(art['From'])
to post['from_email'] = self.addressParse(art['From']).tobytes().decode()
as Python 3.x handles Unicode strings totally differently than Python 2.x.
The upgraded PyClean version correctly rejects articles:
2022-09-02 19:21:24 INFO reject: mid=<tessg4$166jn$2@news.trigofacile.com>, reason=Bad From (Julien)
with:
/Julien/ 20250101
in the bad_from
configuration file (remember that the syntax is a regular expression followed with an expiration date in anYYYYMMDD
syntax).
With python 3.9 I get the following in init_traceback (with the latest pull here):
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 249, in __init__
self.pyfilter = Filter()
File "/usr/local/news/bin/filter/filter_innd.py", line 578, in __init__
self.hourly_events(startup=True)
File "/usr/local/news/bin/filter/filter_innd.py", line 1145, in hourly_events
new_regex = self.regex_file(fn)
File "/usr/local/news/bin/filter/filter_innd.py", line 1242, in regex_file
return re.compile(regex)
File "/usr/local/lib/python3.9/re.py", line 252, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python3.9/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/local/lib/python3.9/sre_compile.py", line 788, in compile
p = sre_parse.parse(p, flags)
File "/usr/local/lib/python3.9/sre_parse.py", line 955, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/local/lib/python3.9/sre_parse.py", line 444, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/local/lib/python3.9/sre_parse.py", line 526, in _parse
code = _escape(source, this, state)
File "/usr/local/lib/python3.9/sre_parse.py", line 427, in _escape
raise source.error("bad escape %s" % escape, len(escape))
re.error: bad escape \I at position 1112
In traceback:
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 349, in filter_art
return self.pyfilter.filter(art)
AttributeError: 'InndFilter' object has no attribute 'pyfilter'
It seems that the regular expression pyClean tries to read from the configuration file is not properly formatted.
File "/usr/local/news/bin/filter/filter_innd.py", line 1242, in regex_file
return re.compile(regex)
Just before the failure, pyClean normally logs in pyclean.log.2023-... something like INFO Compiled 2 rules from local_hosts
which will give a hint to which file was not correctly parsed.
Either the contents of the file is wrong or there's a genuine bug in how pyClean parses it. We cannot know without further information.
At least the bad_from
syntax I tested in my previous message in this thread works. Is it the same you used?
I've slightly restructured the regex_file
function to be more tolerant of failed Regular Expression compilations. It should now log the reason for the compilation failure instead of producing a traceback.
After fixing the regex issues I have the following errors which seem to be related to having specific values set in ~/pyclean/etc/pyclean.cfg for max_crosspost and lines_allowed:
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 349, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 725, in filter
if self.groups['count'] > config.get('groups', 'max_crosspost'):
TypeError: '>' not supported between instances of 'int' and 'str'
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 349, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 926, in filter
isbin = self.binary.isbin(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 466, in isbin
if b64match > config.get('binary', 'lines_allowed'):
TypeError: '>' not supported between instances of 'int' and 'str'
Does it work better with config.getint
instead of config.get
?
Besides the 2 you mention, there would also be 4 other occurrences to fix:
if suspect > config.get('binary', 'lines_allowed'):`
if self.groups['count'] > config.get('groups',
'max_low_crosspost'):
int(art[__LINES__]) <= config.get('logging',
'logart_maxlines')):
maxlines = config.get('logging', 'logart_maxlines')
Thanks Julien, that resolves the error.
I started to notice that one of my readers wasn't rejecting articles for which I added specific bad_from entries and began looking for the cause to find that pyClean was not filtering when INN 2.7.0rc1 is compiled with Python 3.8 on FreeBSD 13.1. The filter starts up and normal items are logged in pyclean/log/pyclean.log, it runs/logs its maintenance tasks, but no articles are filtered. I notice in pyclean/log/traceback the following error:
Not a Python guy and was not able to figure out exactly why, but when I installed the python27 (2.7.18) package and re-compiled INN everything is working as expected. I was able to reproduce this behavior on multiple FreeBSD 13.1 servers.
OS Version and Packages: