Open nemobis opened 9 years ago
The example was from launcher.py: fixing that requires https://github.com/WikiTeam/wikiteam/issues/145
However, we could also improve checkXMLIntegrity() in a simple way: copy the list of titles to a new file, remove each title from the list as we find it in the dump, ensure there is none left. This should also fix https://github.com/WikiTeam/wikiteam/issues/199
Then there is actual validation, https://github.com/WikiTeam/wikiteam/issues/128
Currently, we just check that the XML is well-formed and that it ends with . We should also check that the dump wasn't interrupted before time, as it often happens when a wiki is problematic.
We download the siteinfo now, so we can compare the number of revisions and pages to the "real" one, even when the dump is already compressed, like this:
We probably want to leave some margin before retrying, or just log somewhere visible: otherwise, if the wiki sitestats are out of date, or a page is deleted, the numbers will never coincide. (The example above has 75 % revisions missing.)