WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
725 stars 149 forks source link

keeps attempting to download the same images #81

Open emijrp opened 10 years ago

emijrp commented 10 years ago

From chavez...@gmail.com on December 23, 2013 18:49:20

Every time i attempt to resume a download, it starts with "Retrieving images from 'Apple of Discord.png'" the file is already downloaded (as well as many others that keep being downloaded), and i can view it; however it keeps saying the same amount of images have been downloaded, and it keeps resuming the download from Apple of Discord. this may only be from my end, but i felt that it should be reported just in case.

Original issue: http://code.google.com/p/wikiteam/issues/detail?id=81

emijrp commented 10 years ago

From nemow...@gmail.com on January 22, 2014 08:06:15

Thanks for the report. It would be useful to know the exact arguments you used, or at least the URL of the domain.

emijrp commented 10 years ago

From chavez...@gmail.com on February 23, 2014 19:21:25

very sorry for the delayed reply; i can't remember the exact arguments and the config file i have doesn't quite help my memory. i was trying to download rationalwiki.org with only the current pages and images. it's also happened to me on one other site, but i can't remember what the name of it was.

Thyphoon05 commented 7 years ago

For me it always start from 118 images, maybe the script is reading the images folder incorrectly?

this happened to me while dumping www.wurmpedia.com

Coloradohusky commented 3 years ago

I've been having this issue too, it starts over from the first image even though over 100000 images have already been downloaded, my command is python dumpgenerator2.py --api=https://wiggles.fandom.com/api.php --xml --images --resume

Coloradohusky commented 3 years ago

A potential fix for this could be looking at the last modified/added image file in the images folder, and starting from there (replace if filename2 not in listdir with if filename2 == 'lastmodifiedimage.whatever'