Mattschillinger / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

Screenscraping fails badly on some wikis #32

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
For instance http://wiki.urbandead.com/index.php/Special:Statistics , "35057 
page titles loaded" out of 115,459 pages in the wiki. 
http://p.defau.lt/?l5aakNK8w2Rm4SOBj5T4ng
Now using API, but can't download images this way (due to issue 1 / #22).

Original issue reported on code.google.com by nemow...@gmail.com on 15 Jul 2011 at 5:39

GoogleCodeExporter commented 8 years ago

Original comment by nemow...@gmail.com on 29 Feb 2012 at 11:41

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 22 Jun 2012 at 10:02

GoogleCodeExporter commented 8 years ago
Title list works fine now, with API:

$ wc -l wikiurbandeadcom-20131109-wikidump/*
       43 wikiurbandeadcom-20131109-wikidump/config.txt
        1 wikiurbandeadcom-20131109-wikidump/errors.log
  8486921 wikiurbandeadcom-20131109-wikidump/wikiurbandeadcom-20131109-history.xml
   129561 wikiurbandeadcom-20131109-wikidump/wikiurbandeadcom-20131109-titles.txt

Though the wiki fails as usual because of that huge page with several GB of 
history (issue 8).

Original comment by nemow...@gmail.com on 10 Nov 2013 at 9:23