WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
705 stars 147 forks source link

wikiteam/dumpgenerator.py:2260: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal #435

Open cooperdk opened 2 years ago

cooperdk commented 2 years ago

The current error appears if you resume a crawl and there are images not downloaded. This error must have been present for years, or it may be due to modules that were updated thus not supporting this code.

(dumpgenerator.py)

The error is in line 2260, which is:

if filename2 not in listdir:

The error occurs because the code is trying to compare unicode with non-unicode. This happens in Python 2.7 when not carefully saving files in a format supported by the OS (I am running this from a Synology NAS currently, which means a current Linux).

It is fixed by modifying the listdir code (line 2245-2246) thus:

CHG            listdir = os.listdir('%s/images' % (config['path']))
ADD            listdir2 = [x.encode('utf-8') for x in listdir]
ADD            listdir = listdir2

(in essence, converting the list to a UTF-8 encoded similar list

And in what is now line 2262 (former 2260 but now two lines further down due to the two new lines):

CHG            if filename2.encode('utf-8') not in listdir:

This now ensures the script matches a UTF-8 encoded string with a UTF-8 encoded string and not a UTF-string with bytes or anything else.

It is still advised to spend energy on the WikiTeam3 project as it makes no sense to keep this code alive anymore.