WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
714 stars 148 forks source link

dumpgenerator.py on Windows and filesystem support #73

Open emijrp opened 10 years ago

emijrp commented 10 years ago

From lugu...@gmail.com on November 11, 2013 16:37:18

Found an issue because I'm running r866 on a Windows machine and because the "?" on http://mamedev.emulab.it/undumped/index.php?title=File:Jinglebell(global?).jpg Checking api.php... http://mamedev.emulab.it/undumped/api.php api.php is OK Checking index.php... http://mamedev.emulab.it/undumped/index.php index.php is OK Analysing http://mamedev.emulab.it/undumped/api.php Loading config file... Resuming previous dump process... Domain is mamedevemulabit_undumped Title list was completed in the previous session Domain is mamedevemulabit_undumped XML dump was completed in the previous session Domain is mamedevemulabit_undumped Image list was completed in the previous session 1436 images were found in the directory from a previous session Retrieving images from "Jigsaw Paradise.jpg" Traceback (most recent call last): File "C:\Luiz Augusto\wikiteam\dumpgenerator.py", line 1205, in main() File "C:\Luiz Augusto\wikiteam\dumpgenerator.py", line 1196, in main resumePreviousDump(config=config, other=other) File "C:\Luiz Augusto\wikiteam\dumpgenerator.py", line 1112, in resumePrevious Dump generateImageDump(config=config, other=other, images=images, start=lastfilen ame2) # we resume from previous image, which may be corrupted (or missing .desc) by the previous session ctrl-c or abort File "C:\Luiz Augusto\wikiteam\dumpgenerator.py", line 673, in generateImageDu mp urllib.urlretrieve(url=url, filename='%s/%s' % (imagepath, filename2) ) File "C:\Python27\lib\urllib.py", line 94, in urlretrieve return _urlopener.retrieve(url, filename, reporthook, data) File "C:\Python27\lib\urllib.py", line 244, in retrieve tfp = open(filename, 'wb') IOError: [Errno 22] invalid mode ('wb') or filename: './mamedevemulabit_undumped -20131110-wikidump/images/Jinglebell(global?).jpg'

Original issue: http://code.google.com/p/wikiteam/issues/detail?id=73

emijrp commented 10 years ago

From nemow...@gmail.com on November 11, 2013 23:45:39

AFAIK this problem doesn't exist in GNU/Linux.

emijrp commented 10 years ago

From joehow...@gmail.com on January 27, 2014 11:35:10

This has happened to me 4 times today, with 3 of them the same message as above. The one that was different (but still an IOError: Errno 22) Traceback (most recent call last): File "dumpgenerator.py", line 1220, in main() File "dumpgenerator.py", line 1213, in main createNewDump(config=config, other=other) File "dumpgenerator.py", line 1018, in createNewDump generateImageDump(config=config, other=other, images=images) File "dumpgenerator.py", line 673, in generateImageDump urllib.urlretrieve(url=url, filename='%s/%s' % (imagepath, filename2) ) File "C:\Python27\lib\urllib.py", line 94, in urlretrieve return _urlopener.retrieve(url, filename, reporthook, data) File "C:\Python27\lib\urllib.py", line 244, in retrieve tfp = open(filename, 'wb') IOError: [Errno 22] invalid mode ('wb') or filename: './frbatmanwikiacom-20140127-wikidump/images/The Dark Knight Soundtrack - 01 Why So Serious?'

I am running windows 7.

Attachment: errno 22.png

emijrp commented 10 years ago

From nemow...@gmail.com on January 31, 2014 07:28:59

Blocking: wikiteam:86

emijrp commented 10 years ago

From nemow...@gmail.com on January 31, 2014 07:29:38

Labels: OpSys-Windows

shikulja commented 5 years ago

XML dump saved at... dragonagefandomcom_ru-20190121-current.xml Retrieving image filenames ......................... Found 12221 images 12221 image names loaded Image filenames and URLs saved at... dragonagefandomcom_ru-20190121-images.txt Retrieving images from "start" Creating "./dragonagefandomcom_ru-20190121-wikidump/images" directory Traceback (most recent call last): File "dumpgenerator.py", line 2323, in main() File "dumpgenerator.py", line 2315, in main createNewDump(config=config, other=other) File "dumpgenerator.py", line 1894, in createNewDump session=other['session']) File "dumpgenerator.py", line 1299, in generateImageDump imagefile = open(filename3, 'wb') IOError: [Errno 22] invalid mode ('wb') or filename: u'./dragonagefandomcom_ru-20190121-wikidump/images/latest?cb=20091124141052&path-prefix=ru'

not all use linux, fix it

cooperdk commented 2 years ago

not all use linux, fix it

I agree, an error like this shouldn't happen on a Python based tool. Python is able to handle both Linux and Windows operating systems as well as filesystems. I have never had any errors in the scripts I've written.

But maybe it's remnants that are caused by the medieval Py2.7 as this does not happen in the test runs I've done on the port in progress (Python 3.10).

What you need to do if the filename is gonna be legal is to look at the character encoding.

TheTechRobo commented 2 years ago

Python 3 is UTF-8 by default, so that makes sense.