WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
725 stars 149 forks source link

commonsdownloader terminates for special characters #45

Open emijrp opened 10 years ago

emijrp commented 10 years ago

From nemow...@gmail.com on March 01, 2012 08:31:00

Traceback (most recent call last): File "commonsdownloader.py", line 150, in main() File "commonsdownloader.py", line 126, in main if not os.path.getsize('%s/%s' % (savepath, img_savedas)): #empty file?... File "/usr/lib/python2.7/genericpath.py", line 49, in getsize return os.stat(filename).st_size OSError: [Errno 2] No such file or directory: '2005/03/23/20081110210524!\"Colored\"_drinking_fountain_from_mid-20th_century_with_african-american_drinking.jpg'

Original issue: http://code.google.com/p/wikiteam/issues/detail?id=45

emijrp commented 10 years ago

From ad...@alphacorp.tk on April 15, 2012 03:41:14

Same here:

--2012-04-15 10:38:14-- http://upload.wikimedia.org/wikipedia/commons/archive/c/cd/20070605200920%21US_%24100_reverse.jpg Resolving upload.wikimedia.org... 208.80.152.211 Connecting to upload.wikimedia.org|208.80.152.211|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 130793 (128K) [image/jpeg] Saving to: `2006/02/05/20070605200920!US__reverse.jpg'

100%[===================================================================================================>] 130,793 --.-K/s in 0.03s

2012-04-15 10:38:14 (4.07 MB/s) - `2006/02/05/20070605200920!US__reverse.jpg' saved [130793/130793]

Traceback (most recent call last): File "commonsdownloader.py", line 150, in main() File "commonsdownloader.py", line 126, in main if not os.path.getsize('%s/%s' % (savepath, img_savedas)): #empty file?... File "/usr/lib/python2.6/genericpath.py", line 49, in getsize return os.stat(filename).stsize OSError: [Errno 2] No such file or directory: '2006/02/05/20070605200920!US$100_reverse.jpg'

An issue with the dollar sign in this case, and is a blocker to archive a specific date

emijrp commented 10 years ago

From ad...@alphacorp.tk on April 30, 2012 07:40:52

Just to provide more context to this issue:

20070605200920!US_$100reverse.jpg 20070605200920!US$100reverse.jpg 20060205091528 --2012-04-30 14:39:49-- http://upload.wikimedia.org/wikipedia/commons/archive/c/cd/20070605200920%21US%24100_reverse.jpg Resolving upload.wikimedia.org... 208.80.154.235 Connecting to upload.wikimedia.org|208.80.154.235|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 130793 (128K) [image/jpeg] Saving to: `2006/02/05/20070605200920!US__reverse.jpg'

100%[===========================================================================================>] 130,793 --.-K/s in 0.1s

2012-04-30 14:39:49 (916 KB/s) - `2006/02/05/20070605200920!US__reverse.jpg' saved [130793/130793]

Traceback (most recent call last): File "commonsdownloader.py", line 150, in main() File "commonsdownloader.py", line 126, in main if not os.path.getsize('%s/%s' % (savepath, img_savedas)): #empty file?... File "/usr/lib/python2.6/genericpath.py", line 49, in getsize return os.stat(filename).stsize OSError: [Errno 2] No such file or directory: '2006/02/05/20070605200920!US$100_reverse.jpg'

emijrp commented 10 years ago

From ad...@alphacorp.tk on May 03, 2012 05:31:18

A more severe issue about this, looks like some encoding issue:

20101107161046!��жиквме��е�обакой�и�ае�газе��"�е�е�ний_Ро��ов",_Ро��ов-на-�он�.jpg 20101107161046!��жиквме��е�обакой�и�ае�газе��"�е�е�ний_Ро��ов",Ро��ов-на-�он�.jpg 20060805124041 --2012-05-03 10:33:12-- http://upload.wikimedia.org/wikipedia/commons/archive/2/26/20101107161046%21%D0%9C%D1%83%D0%B6%D0%B8%D0%BA%D0%B2%D0%BC%D0%B5%D1%81%D1%82%D0%B5%D1%81%D1%81%D0%BE%D0%B1%D0%B0%D0%BA%D0%BE%D0%B9%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%D0%B3%D0%B0%D0%B7%D0%B5%D1%82%D1%83%22%D0%92%D0%B5%D1%87%D0%B5%D1%80%D0%BD%D0%B8%D0%B9%D0%A0%D0%BE%D1%81%D1%82%D0%BE%D0%B2%22%2C%D0%A0%D0%BE%D1%81%D1%82%D0%BE%D0%B2-%D0%BD%D0%B0-%D0%94%D0%BE%D0%BD%D1%83.jpg Resolving upload.wikimedia.org... 208.80.154.235 Connecting to upload.wikimedia.org|208.80.154.235|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 974373 (952K) [image/jpeg] Saving to: `2006/08/05/20101107161046!\320\234\321\203\320\266\320\270\320\272\320\262\320\274\320\265\321\201\321\202\320\265\321\201\321\201\320\276\320\261\320\260\320\272\320\276\320\271\321\207\320\270\321\202\320\260\320\265\321\202\320\263\320\260\320\267\320\265\321\202\321\203"\320\222\320\265\321\207\320\265\321\200\320\275\320\270\320\271\320\240\320\276\321\201\321\202\320\276\320\262",_\320\240\320\276\321\201\321\202\320\276\320\262-\320\275\320\260-\320\224\320\276\320\275\321\203.jpg'

100%[============================================================================================>] 974,373 2.10M/s in 0.4s

2012-05-03 10:33:13 (2.10 MB/s) - `2006/08/05/20101107161046!\320\234\321\203\320\266\320\270\320\272\320\262\320\274\320\265\321\201\321\202\320\265\321\201\321\201\320\276\320\261\320\260\320\272\320\276\320\271\321\207\320\270\321\202\320\260\320\265\321\202\320\263\320\260\320\267\320\265\321\202\321\203"\320\222\320\265\321\207\320\265\321\200\320\275\320\270\320\271\320\240\320\276\321\201\321\202\320\276\320\262",\320\240\320\276\321\201\321\202\320\276\320\262-\320\275\320\260-\320\224\320\276\320\275\321\203.jpg' saved [974373/974373]

Traceback (most recent call last): File "commonsdownloader.py", line 150, in main() File "commonsdownloader.py", line 126, in main if not os.path.getsize('%s/%s' % (savepath, img_savedas)): #empty file?... File "/usr/lib/python2.6/genericpath.py", line 49, in getsize return os.stat(filename).stsize OSError: [Errno 2] No such file or directory: '2006/08/05/20101107161046!\xd0\x9c\xd1\x83\xd0\xb6\xd0\xb8\xd0\xba\xd0\xb2\xd0\xbc\xd0\xb5\xd1\x81\xd1\x82\xd0\xb5\xd1\x81\xd1\x81\xd0\xbe\xd0\xb1\xd0\xb0\xd0\xba\xd0\xbe\xd0\xb9\xd1\x87\xd0\xb8\xd1\x82\xd0\xb0\xd0\xb5\xd1\x82\xd0\xb3\xd0\xb0\xd0\xb7\xd0\xb5\xd1\x82\xd1\x83\"\xd0\x92\xd0\xb5\xd1\x87\xd0\xb5\xd1\x80\xd0\xbd\xd0\xb8\xd0\xb9\xd0\xa0\xd0\xbe\xd1\x81\xd1\x82\xd0\xbe\xd0\xb2\",_\xd0\xa0\xd0\xbe\xd1\x81\xd1\x82\xd0\xbe\xd0\xb2-\xd0\xbd\xd0\xb0-\xd0\x94\xd0\xbe\xd0\xbd\xd1\x83.jpg'

emijrp commented 10 years ago

From nemow...@gmail.com on August 04, 2013 07:54:57

Again:

Saving to: `2007/08/04/20091201155308!"Meillandine"_Rose_in_clay_pot.jpg'

100%[=======================================================================================================================================>] 2,035,414 2.37M/s in 0.8s

2013-08-04 11:33:31 (2.37 MB/s) - `2007/08/04/20091201155308!"Meillandine"_Rose_in_clay_pot.jpg' saved [2035414/2035414]

Traceback (most recent call last): File "commonsdownloader.py", line 150, in main() File "commonsdownloader.py", line 126, in main if not os.path.getsize('%s/%s' % (savepath, img_savedas)): #empty file?... File "/usr/lib/python2.7/genericpath.py", line 49, in getsize return os.stat(filename).st_size OSError: [Errno 2] No such file or directory: '2007/08/04/20091201155308!\"Meillandine\"_Rose_in_clay_pot.jpg'


The file was not actually saved, or at least I can't find it, so I don't know at what point the problem is.

emijrp commented 10 years ago

From nemow...@gmail.com on August 04, 2013 09:24:14

With r825 it at least continues, but the underlying problem stays.