99moorem / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

Download only a set of pages using --api=... --pagelist=file.txt --xml #29

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Idea from a thread on wikiresearch-l mailing list speaking about Special:Export 
stuff.

----
Andrea Forte:
[...]
Better yet, does anyone already have a working script/tool handy that
grabs all the revisions of a page? :)
----

Original issue reported on code.google.com by emi...@gmail.com on 14 Jul 2011 at 6:56

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 23 Jun 2012 at 1:12

GoogleCodeExporter commented 8 years ago
Hi,
I have a script that downloads everything in a category and uploads it to 
archive.org, it is a merge between pywikibot and wikiteam. 

Here are some instructions on how to customize it :

The hardcoded list of categories is here:
https://github.com/h4ck3rm1k3/wikiteam/blob/master/dumpgenerator.py#L138

Here is the line that contains the bucket to upload to, the bucket has to exist 
already (there is no code yet to create it)
https://github.com/h4ck3rm1k3/wikiteam/blob/master/dumpgenerator.py#L804

Then in my scripts I upload the results to a new wikia,

The target upload site is in the separate poster script :
https://github.com/h4ck3rm1k3/wikiteam/blob/master/speedydeletion.py
That also needs a family to be setup :
https://github.com/h4ck3rm1k3/wikiteam/blob/master/families/speedydeletion_famil
y.py

The script that runs it all in cron :
https://github.com/h4ck3rm1k3/wikiteam/blob/master/runexport.sh

Original comment by JamesMikeDuPont@googlemail.com on 19 Jul 2012 at 8:55

GoogleCodeExporter commented 8 years ago
When I have some time I will backport code to be usable.

Original comment by JamesMikeDuPont@googlemail.com on 19 Jul 2012 at 9:01

GoogleCodeExporter commented 8 years ago
Ok, I did a quick hack for you,  use it like this :
python dumpgenerator_single.py --titles=kosovo.txt
The titles file contains a list of articles to dump.

It does not do any resume or upload to archive.org
https://github.com/h4ck3rm1k3/wikiteam/blob/master/dumpgenerator_single.py

mike

Original comment by JamesMikeDuPont@googlemail.com on 19 Jul 2012 at 9:50

GoogleCodeExporter commented 8 years ago
Code needs to be reviewed, and then we can check it in.

Original comment by ad...@alphacorp.tk on 20 Jul 2012 at 10:48

GoogleCodeExporter commented 8 years ago
Hydriz, can you test JamesMikeDuPont's patch?

Original comment by nemow...@gmail.com on 8 Nov 2013 at 10:16

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 12 Nov 2013 at 6:38

GoogleCodeExporter commented 8 years ago
@Nemo: I guess we can leave this issue open first until DumpGenerator 2.0 is 
released. I already added this fix into 2.0 and works like a charm :)

Original comment by ad...@alphacorp.tk on 16 Nov 2013 at 2:03

GoogleCodeExporter commented 8 years ago
Hydriz, maybe it's time to test that patch again? :)

Original comment by nemow...@gmail.com on 31 Jan 2014 at 3:31