Mattschillinger / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

Using API, dumpgenerator.py finds only 500 pages per namespace #56

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Example:

Checking api.php... http://www.aiowiki.com/w/api.php
api.php is OK
Checking index.php... http://www.aiowiki.com/w/index.php
index.php is OK
Analysing http://www.aiowiki.com/w/api.php
Trying generating a new dump into a new directory...
Loading page titles from namespaces = all
Excluding titles from namespaces = None
30 namespaces found
    Retrieving titles in the namespace 0
.    500 titles retrieved in the namespace 0
    Retrieving titles in the namespace 1
.    308 titles retrieved in the namespace 1

etc. In the end it finds only about five thousands titles instead of 43 
thousands. 
http://www.aiowiki.com/w/api.php?action=query&meta=siteinfo&siprop=statistics

I've tested most old revisions of dumpgenerator.py and it's the same. Using 
only index.py works.

Original issue reported on code.google.com by nemow...@gmail.com on 9 Nov 2012 at 10:03

GoogleCodeExporter commented 8 years ago

Original comment by nemow...@gmail.com on 9 Nov 2012 at 10:05

GoogleCodeExporter commented 8 years ago
Of course I meant index.php (with --index= option).

Original comment by nemow...@gmail.com on 9 Nov 2012 at 10:37

GoogleCodeExporter commented 8 years ago
Can you give us a headsup about what revision you are using?

Original comment by ad...@alphacorp.tk on 9 Nov 2012 at 3:29

GoogleCodeExporter commented 8 years ago
As I said, I've tested multiple of them. IIRC, r804, r709, r675, r610, r343, 
r261 and probably some more.

Original comment by nemow...@gmail.com on 9 Nov 2012 at 4:39

GoogleCodeExporter commented 8 years ago
http://www.aiowiki.com/wiki/Special:Version shows version 1.20, latest stable.

Nemo: the API has for some time returned a maximum of 500 titles per request 
for most accessors, 5000 for a logged-in member of the bot usergroup. It sounds 
to me (without looking at the code) as though the script should be updated to 
use generator= in the query, and loop through.

Original comment by amg...@wikimedians.ca on 9 Nov 2012 at 4:41

GoogleCodeExporter commented 8 years ago
Amgine, thanks. That's surely the problem, we really don't know how to use API 
properly. ;-)
I don't know if pagegenerators would be the best approach because we use the 
byproduct of a list of titles quite a lot and IIRC that way they would be 
directly fed to the export query.
Anyway, let's see if people manage to rewrite the script properly from scratch 
or we have to fix this on our original one.
https://meta.wikimedia.org/wiki/WikiTeam/Dumpgenerator_rewrite

Original comment by nemow...@gmail.com on 9 Nov 2012 at 7:30

GoogleCodeExporter commented 8 years ago
Fixed by emijrp in r806. :-)

Original comment by nemow...@gmail.com on 9 Nov 2012 at 7:32