Closed GoogleCodeExporter closed 8 years ago
Fixes encoding bug.
What will happen:
response = páginas de España
....response=p%C3%A1ginas+de+Espa%C3%B1a.... -> What is sent to server.
Just add doseq=1 to the urllib.urlencode call. I've submitted a trivial patch
anyway.
Original comment by ryankas...@gmail.com
on 24 Dec 2007 at 10:58
Attachments:
this assumes the server can make sense of p%C3%A1ginas+de+Espa%C3%B1a (unicode
numbers in hex)
Original comment by ryankas...@gmail.com
on 24 Dec 2007 at 11:06
related thread at
http://groups.google.com/group/recaptcha/browse_thread/thread/1729f7edc1de0d4b/5
cb1894af65fe327
Original comment by jabron...@gmail.com
on 30 Oct 2008 at 3:15
relevant python bug: http://bugs.python.org/issue1349732
Original comment by jabron...@gmail.com
on 30 Oct 2008 at 6:13
"doseq=1" is a red herring. Observe:
>>> urllib.urlencode({'response': 'páginas de España'})
'response=p%C3%A1ginas+de+Espa%C3%B1a'
>>> # this looks fine
>>> urllib.urlencode({'response': 'páginas de España'}, doseq=1)
'response=p%C3%A1ginas+de+Espa%C3%B1a'
>>> # doseq=1 produces the same (correct) result, so far there's no reason to
use it
>>> # note that we've been passing a string...
>>> # if instead we pass a unicode, characters will be converted to '?' (%3F)
>>> # since urlencode calls s.encode('ASCII', 'replace'):
>>> urllib.urlencode({'response': u'páginas de España'}, doseq=1)
'response=p%3Fginas+de+Espa%3Fa'
>>> # this is not what we want!
>>> # here the "doseq=1" is having a mitigating effect (albeit not the one we
want),
>>> # since we're passing a unicode. Without it we get an error:
>>> urllib.urlencode({'response': u'páginas de España'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.p
y",
line 1250, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1:
ordinal not in range(128)
From your original patch, it's clear you're passing a unicode, as you were
catching
the UnicodeEncodeError generated by something like the above. In which case,
doseq=1
will result in characters being converted to '?' as shown, which is not what we
want.
Original comment by jabron...@gmail.com
on 30 Oct 2008 at 5:32
fixed r106
Original comment by jabron...@gmail.com
on 30 Oct 2008 at 6:26
Original comment by adrian.g...@gmail.com
on 30 Mar 2012 at 6:19
Original issue reported on code.google.com by
ryankas...@gmail.com
on 24 Dec 2007 at 7:28Attachments: