Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.
wikiteam/dumpgenerator.py:2260: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal #435
The current error appears if you resume a crawl and there are images not downloaded.
This error must have been present for years, or it may be due to modules that were updated thus not supporting this code.
(dumpgenerator.py)
The error is in line 2260, which is:
if filename2 not in listdir:
The error occurs because the code is trying to compare unicode with non-unicode. This happens in Python 2.7 when not carefully saving files in a format supported by the OS (I am running this from a Synology NAS currently, which means a current Linux).
It is fixed by modifying the listdir code (line 2245-2246) thus:
CHG listdir = os.listdir('%s/images' % (config['path']))
ADD listdir2 = [x.encode('utf-8') for x in listdir]
ADD listdir = listdir2
(in essence, converting the list to a UTF-8 encoded similar list
And in what is now line 2262 (former 2260 but now two lines further down due to the two new lines):
CHG if filename2.encode('utf-8') not in listdir:
This now ensures the script matches a UTF-8 encoded string with a UTF-8 encoded string and not a UTF-string with bytes or anything else.
It is still advised to spend energy on the WikiTeam3 project as it makes no sense to keep this code alive anymore.
The current error appears if you resume a crawl and there are images not downloaded. This error must have been present for years, or it may be due to modules that were updated thus not supporting this code.
(dumpgenerator.py)
The error is in line 2260, which is:
if filename2 not in listdir:
The error occurs because the code is trying to compare unicode with non-unicode. This happens in Python 2.7 when not carefully saving files in a format supported by the OS (I am running this from a Synology NAS currently, which means a current Linux).
It is fixed by modifying the listdir code (line 2245-2246) thus:
(in essence, converting the list to a UTF-8 encoded similar list
And in what is now line 2262 (former 2260 but now two lines further down due to the two new lines):
This now ensures the script matches a UTF-8 encoded string with a UTF-8 encoded string and not a UTF-string with bytes or anything else.
It is still advised to spend energy on the WikiTeam3 project as it makes no sense to keep this code alive anymore.