Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
It's a pain to resume download with commonsdownloader.py:
1) if the ZIP was already created for the day, it starts downloading again and in the end overwrites the ZIP (but you can just kill it before it reaches compression stage);
2) more importantly, if they day wasn't downloaded completely, it deletes the CSV file and starts downloading everything from scratch:
2a) wget already avoids redownloading the file,
2b) however curl redownloads the XML.
Hence resuming currently takes ages.
From nemow...@gmail.com on September 28, 2013 09:12:59
It's a pain to resume download with commonsdownloader.py: 1) if the ZIP was already created for the day, it starts downloading again and in the end overwrites the ZIP (but you can just kill it before it reaches compression stage); 2) more importantly, if they day wasn't downloaded completely, it deletes the CSV file and starts downloading everything from scratch: 2a) wget already avoids redownloading the file, 2b) however curl redownloads the XML. Hence resuming currently takes ages.
Original issue: http://code.google.com/p/wikiteam/issues/detail?id=65