WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
714 stars 148 forks source link

wikia.py: actually get the correct DB name #215

Open nemobis opened 9 years ago

nemobis commented 9 years ago

https://github.com/WikiTeam/wikiteam/blob/ce6fbfee557582126fd4b7b8ff1653b7fc589da5/listsofwikis/mediawiki/wikia.py#L53 is too simplistic. For instance, http://hkbus.wikia.com/wiki/Special:Version points to http://s3.amazonaws.com/wikia_xml_dumps/z/zh/zhhongkongbus_pages_full.xml.gz (probably the wiki was renamed at some point?).

The script concludes a dump is missing, while it's just looking in the wrong place.

We can't rely on Special:Statistics links, as wikiadownloader.py does, because they are often wrong (outdated, or missing even if a dump exists). We can just fetch the "wikiid" from http://hkbus.wikia.com/api.php?action=query&meta=siteinfo&siprop=general though.

nemobis commented 6 years ago

Is this fixed now? See the most recent dump of the script: https://github.com/WikiTeam/wikiteam/pull/310

nemobis commented 4 years ago

This got more difficult, I think, with recent domain name changes. For instance a wiki can be listed as "mac.fandom.com/zh" and then the dump is supposedly at https://s3.amazonaws.com/wikia_xml_dumps/z/zh/zhwikimac_pages_current.xml.7z (according to Special:Statistics), or it's "townn-titles.fandom.com/ru/" and the dump is supposedly at https://s3.amazonaws.com/wikia_xml_dumps/t/to/townntitles_pages_current.xml.7z .

I say supposedly because none of these links works. Maybe we should just give up on the Wikia-generated dumps.