WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
714 stars 148 forks source link

dumpgenerator.py gives false ready/complete message #199

Open trianity opened 9 years ago

trianity commented 9 years ago

I received the following message running dumpgenerator.py from Ubuntu Linux terminal:

Loading config file... Resuming previous dump process... Title list was completed in the previous session Resuming XML dump from "Something" Retrieving the XML for every page from "Something" dumpgenerator.py:623: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if title == start: # start downloading from start, included XML dump saved at... something-20141012-history.xml Downloading index.php (Main Page) as index.html Downloading Special:Version with extensions and other related info Downloading site info as siteinfo.json ---> Congratulations! Your dump is complete <---

It is a false message. The dump was in the 10% state far from the complete, but after UnicodeWarning the process was terminated.

nemobis commented 9 years ago

Thanks for your report and sorry for the lack of answer. You are right so there wasn't much to say. :)

We are aware of the general issue, which is a bad one, and have tracked it for a while at https://github.com/WikiTeam/wikiteam/issues/145

In https://github.com/WikiTeam/wikiteam/issues/214#issuecomment-78017869 I've now proposed a solution for "simple" cases like yours. Either way, I've added a reminder to check this bug is fixed before our next "release".