WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
713 stars 148 forks source link

Images download stuck on certain file formats #312

Open nemobis opened 6 years ago

nemobis commented 6 years ago

I noticed a handful wikis which get stuck repeatedly while downloading files. Two had a swf extension and one a jar extension. We have to test if something weird happens with such formats (might also be a misconfigured MediaWiki or webserver, I don't remember what's the support for such files).

nemobis commented 6 years ago

It definitely keeps happening, e.g.:

Analysing http://imamp.colum.edu/mediawiki/api.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
340 images were found in the directory from a previous session
Retrieving images from "Car1.swf"

Also:

Analysing http://ids.snu.ac.kr/w/api.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
1449 images were found in the directory from a previous session
Retrieving images from "Mariadb-java-client-1.3.1.jar"

And:

Analysing http://mediawikibe.uwindsor.ca/clew/api.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
3 images were found in the directory from a previous session
Retrieving images from "1 Minute on the Sakai Cluster.swf"