Closed sky-is-winning closed 3 years ago
Il 26/07/21 15:15, floogal ha scritto:
Should there not be an option to just skip that image and continue with the next images?
There is: it's called retrying manually. There's no automatic handling because nobody has yet devised a way to automatically guess whether an error is real or not and what to do about it.
But if you do retry, what then? It'll likely return 404 again, and exit again. The only sane workaround I see is to allow some 404 count threshold in args...
We err on the side of caution. It's dangerous to automatically skip failed images and call a dump done anyway, because the user may incorrectly think the wiki was archived where it was not. Therefore we force a manual fix.
Therefore we force a manual fix.
Il 16/11/21 01:50, burner1024 ha scritto:
- Try again, 404. Again, 404. What's the fix?
If the image is truly missing, you need to remove manually it from the list of titles to download.
Oh, one can do that. Well, I guess technically that works.
If the image is truly missing
How is this supposed to be determined, then?
Il 16/11/21 12:45, burner1024 ha scritto:
How is this supposed to be determined, then?
Probably one would start from the MediaWiki interface, see what different sets of information are sent by MediaWiki interface, MediaWiki API, webserver, other sources. If there is a disagreement, debug the likely misconfiguration or software bug. Find out what data can be pulled out anyway.
Archiving MediaWiki sites requires knowledge of MediaWiki, there's little to do about that. If you don't have intimate knowledge of MediaWiki, it's still useful to try: just make sure to note that when you archive your dumps on archive.org. If you actually need to transfer a wiki with 100k images, I'd probably recommend hiring an expert.
Trying to download over 100,000 images from a wiki. Dumpgenerator.py gets a few thousand in, and then gets a 404 error on a single image, and exits. Should there not be an option to just skip that image and continue with the next images?