fsrrt / wikiteam

Automatically exported from code.google.com/p/wikiteam
1 stars 0 forks source link

commonsdownloader terminates for special characters #45

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Traceback (most recent call last):
  File "commonsdownloader.py", line 150, in <module>
    main()
  File "commonsdownloader.py", line 126, in main
    if not os.path.getsize('%s/%s' % (savepath, img_saved_as_)): #empty file?...
  File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
    return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: 
'2005/03/23/20081110210524!\\"Colored\\"_drinking_fountain_from_mid-20th_century
_with_african-american_drinking.jpg'

Original issue reported on code.google.com by nemow...@gmail.com on 1 Mar 2012 at 7:31

GoogleCodeExporter commented 8 years ago
Same here:

--2012-04-15 10:38:14--  
http://upload.wikimedia.org/wikipedia/commons/archive/c/cd/20070605200920%21US_%
24100_reverse.jpg
Resolving upload.wikimedia.org... 208.80.152.211
Connecting to upload.wikimedia.org|208.80.152.211|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 130793 (128K) [image/jpeg]
Saving to: `2006/02/05/20070605200920!US__reverse.jpg'

100%[===========================================================================
========================>] 130,793     --.-K/s   in 0.03s   

2012-04-15 10:38:14 (4.07 MB/s) - `2006/02/05/20070605200920!US__reverse.jpg' 
saved [130793/130793]

Traceback (most recent call last):
  File "commonsdownloader.py", line 150, in <module>
    main()
  File "commonsdownloader.py", line 126, in main
    if not os.path.getsize('%s/%s' % (savepath, img_saved_as_)): #empty file?...
  File "/usr/lib/python2.6/genericpath.py", line 49, in getsize
    return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: 
'2006/02/05/20070605200920!US_$100_reverse.jpg'

An issue with the dollar sign in this case, and is a blocker to archive a 
specific date

Original comment by ad...@alphacorp.tk on 15 Apr 2012 at 10:41

GoogleCodeExporter commented 8 years ago
Just to provide more context to this issue:

20070605200920!US_$100_reverse.jpg 20070605200920!US_$100_reverse.jpg 
20060205091528
--2012-04-30 14:39:49--  
http://upload.wikimedia.org/wikipedia/commons/archive/c/cd/20070605200920%21US_%
24100_reverse.jpg
Resolving upload.wikimedia.org... 208.80.154.235
Connecting to upload.wikimedia.org|208.80.154.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 130793 (128K) [image/jpeg]
Saving to: `2006/02/05/20070605200920!US__reverse.jpg'

100%[===========================================================================
================>] 130,793     --.-K/s   in 0.1s    

2012-04-30 14:39:49 (916 KB/s) - `2006/02/05/20070605200920!US__reverse.jpg' 
saved [130793/130793]

Traceback (most recent call last):
  File "commonsdownloader.py", line 150, in <module>
    main()
  File "commonsdownloader.py", line 126, in main
    if not os.path.getsize('%s/%s' % (savepath, img_saved_as_)): #empty file?...
  File "/usr/lib/python2.6/genericpath.py", line 49, in getsize
    return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: 
'2006/02/05/20070605200920!US_$100_reverse.jpg'

Original comment by ad...@alphacorp.tk on 30 Apr 2012 at 2:40

GoogleCodeExporter commented 8 years ago
A more severe issue about this, looks like some encoding issue:

20101107161046!��жик_вме��е_�_�обакой_�и�ае�_�
�азе��_"�е�е�ний_Ро��ов",_Ро��ов-на-�он�
��.jpg 
20101107161046!��жик_вме��е_�_�обакой_�и�ае�_�
�азе��_"�е�е�ний_Ро��ов",_Ро��ов-на-�он�
��.jpg 20060805124041
--2012-05-03 10:33:12--  
http://upload.wikimedia.org/wikipedia/commons/archive/2/26/20101107161046%21%D0%
9C%D1%83%D0%B6%D0%B8%D0%BA_%D0%B2%D0%BC%D0%B5%D1%81%D1%82%D0%B5_%D1%81_%D1%81%D0
%BE%D0%B1%D0%B0%D0%BA%D0%BE%D0%B9_%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82_%D0%B3%D0
%B0%D0%B7%D0%B5%D1%82%D1%83_%22%D0%92%D0%B5%D1%87%D0%B5%D1%80%D0%BD%D0%B8%D0%B9_
%D0%A0%D0%BE%D1%81%D1%82%D0%BE%D0%B2%22%2C_%D0%A0%D0%BE%D1%81%D1%82%D0%BE%D0%B2-
%D0%BD%D0%B0-%D0%94%D0%BE%D0%BD%D1%83.jpg
Resolving upload.wikimedia.org... 208.80.154.235
Connecting to upload.wikimedia.org|208.80.154.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 974373 (952K) [image/jpeg]
Saving to: 
`2006/08/05/20101107161046!\320\234\321\203\320\266\320\270\320\272_\320\262\320
\274\320\265\321\201\321\202\320\265_\321\201_\321\201\320\276\320\261\320\260\3
20\272\320\276\320\271_\321\207\320\270\321\202\320\260\320\265\321\202_\320\263
\320\260\320\267\320\265\321\202\321\203_"\320\222\320\265\321\207\320\265\321\2
00\320\275\320\270\320\271_\320\240\320\276\321\201\321\202\320\276\320\262",_\3
20\240\320\276\321\201\321\202\320\276\320\262-\320\275\320\260-\320\224\320\276
\320\275\321\203.jpg'

100%[===========================================================================
=================>] 974,373     2.10M/s   in 0.4s

2012-05-03 10:33:13 (2.10 MB/s) - 
`2006/08/05/20101107161046!\320\234\321\203\320\266\320\270\320\272_\320\262\320
\274\320\265\321\201\321\202\320\265_\321\201_\321\201\320\276\320\261\320\260\3
20\272\320\276\320\271_\321\207\320\270\321\202\320\260\320\265\321\202_\320\263
\320\260\320\267\320\265\321\202\321\203_"\320\222\320\265\321\207\320\265\321\2
00\320\275\320\270\320\271_\320\240\320\276\321\201\321\202\320\276\320\262",_\3
20\240\320\276\321\201\321\202\320\276\320\262-\320\275\320\260-\320\224\320\276
\320\275\321\203.jpg' saved [974373/974373]

Traceback (most recent call last):
  File "commonsdownloader.py", line 150, in <module>
    main()
  File "commonsdownloader.py", line 126, in main
    if not os.path.getsize('%s/%s' % (savepath, img_saved_as_)): #empty file?...
  File "/usr/lib/python2.6/genericpath.py", line 49, in getsize
    return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: 
'2006/08/05/20101107161046!\xd0\x9c\xd1\x83\xd0\xb6\xd0\xb8\xd0\xba_\xd0\xb2\xd0
\xbc\xd0\xb5\xd1\x81\xd1\x82\xd0\xb5_\xd1\x81_\xd1\x81\xd0\xbe\xd0\xb1\xd0\xb0\x
d0\xba\xd0\xbe\xd0\xb9_\xd1\x87\xd0\xb8\xd1\x82\xd0\xb0\xd0\xb5\xd1\x82_\xd0\xb3
\xd0\xb0\xd0\xb7\xd0\xb5\xd1\x82\xd1\x83_\\"\xd0\x92\xd0\xb5\xd1\x87\xd0\xb5\xd1
\x80\xd0\xbd\xd0\xb8\xd0\xb9_\xd0\xa0\xd0\xbe\xd1\x81\xd1\x82\xd0\xbe\xd0\xb2\\"
,_\xd0\xa0\xd0\xbe\xd1\x81\xd1\x82\xd0\xbe\xd0\xb2-\xd0\xbd\xd0\xb0-\xd0\x94\xd0
\xbe\xd0\xbd\xd1\x83.jpg'

Original comment by ad...@alphacorp.tk on 3 May 2012 at 12:31

GoogleCodeExporter commented 8 years ago
Again:

Saving to: `2007/08/04/20091201155308!"Meillandine"_Rose_in_clay_pot.jpg'

100%[===========================================================================
============================================================>] 2,035,414   
2.37M/s   in 0.8s

2013-08-04 11:33:31 (2.37 MB/s) - 
`2007/08/04/20091201155308!"Meillandine"_Rose_in_clay_pot.jpg' saved 
[2035414/2035414]

Traceback (most recent call last):
  File "commonsdownloader.py", line 150, in <module>
    main()
  File "commonsdownloader.py", line 126, in main
    if not os.path.getsize('%s/%s' % (savepath, img_saved_as_)): #empty file?...
  File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
    return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: 
'2007/08/04/20091201155308!\\"Meillandine\\"_Rose_in_clay_pot.jpg'

----

The file was not actually saved, or at least I can't find it, so I don't know 
at what point the problem is.

Original comment by nemow...@gmail.com on 4 Aug 2013 at 2:54

GoogleCodeExporter commented 8 years ago
With r825 it at least continues, but the underlying problem stays.

Original comment by nemow...@gmail.com on 4 Aug 2013 at 4:24