Closed GERZAC1002 closed 2 years ago
Unfortunately this wiki intentionally returns HTTP 403 for api.php in many cases. So it's an arms race; if we implement a workaround they will just block different user-agents or whatever. I suggest to contact the sysadmin so that they create regular dumps themselves and make them available on the Internet Archive, so people won't be tempted to export manually so often.
I don't recommend using your other method with Special:Export because it will increase their load and therefore invite more blocks.
Oh okay but understandable considering that the whole dump ended up at over 30GB, I actually considered asking them for a dump if i hadn't found the alternative. The alternative to using this tool would have been mirroring the whole page using httrack which would have had a much bigger overhead as last time I tried that on a wiki page it tried to download the complete history of every page and had no options to easily exclude namespaces Any recommendations on how to put it on the internet archive as it is huge with all the images? (compressing the folder would still exceed the default maximum file size of a Fat32(sadly it is still a common standard) formatted drive so i don't know how viable it is) so i guess after answering the above question this issue can be closed as it seems like the features were intentionally disabled by the Administrators of the wiki
EDIT: found https://archive.org/download/wiki-pokewikide so is there a way to add the dump that I already have?(after i compressed it)
Il 12/04/22 20:24, Gernot Zacharias ha scritto:
Any recommendations on how to put it on the internet archive as it is huge with all the images?
If the wiki admins made the dump, it would be on their server, so the upload to the Internet Archive would probably be quite fast.
(compressing the folder would still exceed the default maximum file size of a Fat32(sadly it is still a common standard) formatted drive so i don't know how viable it is)
You can start with the history 7z which launcher.py would produce, it's going to be much smaller. It's ok to upload a 30 GB file on the Internet Archive. If you have a FAT HDD, you can create 4 GB volumes.
If your connection is not sufficiently reliable/fast to finish a 30 GB upload, you can create a torrent file containing the file and upload the torrent file instead, it will then download from your torrent client.
Full comand that was used:
./dumpgenerator.py --xmlrevisions --images --xml --curonly https://pokewiki.de --namespace 0
I used the command without '--namespace 0' before with the same result, i only had to add it for reproducing the error while not putting to much stress on the wiki page it self.Expected behaviour:
creating a dump of https://pokewiki.de
Actual behaviour after a a few minutes:
Full log: dumgenerator.py_xmlrevisions.log
Tail of the output file:
Quick 'integrity' check on the output file
Number of page titles in side *-titles.txt: 86796
Test without '--xmlrevisions'
After using the pull request #280 back from 2016 and integrating it into a new version(pull request #429) i managed to get a full dump of the mentioned wiki.