WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
721 stars 149 forks source link

Script looks for a title non existing in wiki nor titles list #31

Closed emijrp closed 10 years ago

emijrp commented 10 years ago

From nemow...@gmail.com on July 15, 2011 19:05:33

On a couple of wikis the script got stuck in an error loop because it was looking for a non existing page. The title was recurrent and strange, "AMF5LKE43MNFGHKSDMRTJ", so I thought they were just weird wikis with mirrored junk content. On reviewing it, though, I discovered that the title never existed on those wikis and is not contained in the title list neither, and Special:Export/API work. Probably some temporary craziness of python or my machine which will be resolved by next run, but still worth tracking. I attach titles list and here's the terminal output (nothing useful anywhere): http://p.defau.lt/?tr7ACOxyEH4oBmQcD0_JCg http://p.defau.lt/?2_qzc5xn0_9jFv_eN5HhGw

Attachment: wikidocorg-20110712-titles.txt.7z wikiznanieru_ru_wz-20110712-titles.txt.7z

Original issue: http://code.google.com/p/wikiteam/issues/detail?id=31

emijrp commented 10 years ago

From nemow...@gmail.com on July 15, 2011 10:16:44

And again on a wikis whose Special:Export requires login and API doesn't provide titles: http://p.defau.lt/?9MV6O0TDa0SAJSBUmJ7Mfw

emijrp commented 10 years ago

From nemow...@gmail.com on July 15, 2011 10:47:14

No, it happened again... http://p.defau.lt/?Yzj1U1bNvYzbzVWRAWDVnw http://p.defau.lt/?Gv6rOhxkhjXefmNZ4I_fOQ

emijrp commented 10 years ago

From emi...@gmail.com on July 15, 2011 11:11:55

It is a random title to extract the headers (namespace info, and other info) from XML. But now I have changed it to "Main_Page" to avoid possible errors.

By the way, this issue contains two different errors:

Regards

Status: Started

emijrp commented 10 years ago

From nemow...@gmail.com on February 29, 2012 15:44:03

Marking as duplicate of the bug about namespace retrieval then, the other issues are wiki-specific and would require separate bugs anyway.

Status: Duplicate
Mergedinto: 10