WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
725 stars 149 forks source link

Download only a set of pages using --api=... --pagelist=file.txt --xml #29

Open emijrp opened 10 years ago

emijrp commented 10 years ago

From emi...@gmail.com on July 14, 2011 20:56:27

Idea from a thread on wikiresearch-l mailing list speaking about Special:Export stuff.


Andrea Forte: [...] Better yet, does anyone already have a working script/tool handy that

grabs all the revisions of a page? :)

Original issue: http://code.google.com/p/wikiteam/issues/detail?id=29

emijrp commented 10 years ago

From ad...@alphacorp.tk on June 23, 2012 06:12:26

Labels: -Type-Defect Type-Enhancement

emijrp commented 10 years ago

From JamesMikeDuPont@googlemail.com on July 19, 2012 01:55:54

Hi, I have a script that downloads everything in a category and uploads it to archive.org, it is a merge between pywikibot and wikiteam.

Here are some instructions on how to customize it :

The hardcoded list of categories is here: https://github.com/h4ck3rm1k3/wikiteam/blob/master/dumpgenerator.py#L138 Here is the line that contains the bucket to upload to, the bucket has to exist already (there is no code yet to create it) https://github.com/h4ck3rm1k3/wikiteam/blob/master/dumpgenerator.py#L804 Then in my scripts I upload the results to a new wikia,

The target upload site is in the separate poster script : https://github.com/h4ck3rm1k3/wikiteam/blob/master/speedydeletion.py That also needs a family to be setup : https://github.com/h4ck3rm1k3/wikiteam/blob/master/families/speedydeletion_family.py The script that runs it all in cron : https://github.com/h4ck3rm1k3/wikiteam/blob/master/runexport.sh

emijrp commented 10 years ago

From JamesMikeDuPont@googlemail.com on July 19, 2012 02:01:20

When I have some time I will backport code to be usable.

emijrp commented 10 years ago

From JamesMikeDuPont@googlemail.com on July 19, 2012 02:50:52

Ok, I did a quick hack for you, use it like this : python dumpgenerator_single.py --titles=kosovo.txt The titles file contains a list of articles to dump.

It does not do any resume or upload to archive.org https://github.com/h4ck3rm1k3/wikiteam/blob/master/dumpgenerator_single.py mike

emijrp commented 10 years ago

From ad...@alphacorp.tk on July 20, 2012 03:48:17

Code needs to be reviewed, and then we can check it in.

Labels: NeedsReview

emijrp commented 10 years ago

From nemow...@gmail.com on November 08, 2013 14:16:34

Hydriz, can you test JamesMikeDuPont's patch?

emijrp commented 10 years ago

From ad...@alphacorp.tk on November 11, 2013 22:38:55

Blocking: wikiteam:75

emijrp commented 10 years ago

From ad...@alphacorp.tk on November 16, 2013 06:03:36

@Nemo: I guess we can leave this issue open first until DumpGenerator 2.0 is released. I already added this fix into 2.0 and works like a charm :)

emijrp commented 10 years ago

From nemow...@gmail.com on January 31, 2014 07:31:01

Hydriz, maybe it's time to test that patch again? :)