Bug in reCAPTCHA Python library + patch

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Enter a character beyond ASCII range(128) using Python library
2. submit
3. see error

What is the expected output? What do you see instead?

I expect it to fail and set an error code that signals a byte with a value
greater than \x7f is not possible to encode.

What version of the product are you using? On what operating system?

Ubuntu Linux 7.10, Python 2.5 (this will happen everywhere I think)

I've added added a patch ran from the root of the working copy to fix this
error. Let me know if this is how things should be handled.

Original issue reported on code.google.com by ryankas...@gmail.com on 24 Dec 2007 at 7:28

Attachments:

ryan_kaskel_fix

GoogleCodeExporter commented 8 years ago

Fixes encoding bug. 

What will happen:

response = páginas de España 
....response=p%C3%A1ginas+de+Espa%C3%B1a....  -> What is sent to server.

Just add doseq=1 to the urllib.urlencode call. I've submitted a trivial patch 
anyway.

Original comment by ryankas...@gmail.com on 24 Dec 2007 at 10:58

Attachments:

ryan_kaskel_fix

GoogleCodeExporter commented 8 years ago

this assumes the server can make sense of p%C3%A1ginas+de+Espa%C3%B1a (unicode
numbers in hex)

Original comment by ryankas...@gmail.com on 24 Dec 2007 at 11:06

GoogleCodeExporter commented 8 years ago

related thread at
http://groups.google.com/group/recaptcha/browse_thread/thread/1729f7edc1de0d4b/5
cb1894af65fe327

Original comment by jabron...@gmail.com on 30 Oct 2008 at 3:15

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

relevant python bug: http://bugs.python.org/issue1349732

Original comment by jabron...@gmail.com on 30 Oct 2008 at 6:13

GoogleCodeExporter commented 8 years ago

"doseq=1" is a red herring. Observe:

>>> urllib.urlencode({'response': 'páginas de España'})
'response=p%C3%A1ginas+de+Espa%C3%B1a'
>>> # this looks fine

>>> urllib.urlencode({'response': 'páginas de España'}, doseq=1)
'response=p%C3%A1ginas+de+Espa%C3%B1a'
>>> # doseq=1 produces the same (correct) result, so far there's no reason to 
use it

>>> # note that we've been passing a string...
>>> # if instead we pass a unicode, characters will be converted to '?' (%3F)
>>> # since urlencode calls s.encode('ASCII', 'replace'):
>>> urllib.urlencode({'response': u'páginas de España'}, doseq=1)
'response=p%3Fginas+de+Espa%3Fa'
>>> # this is not what we want!

>>> # here the "doseq=1" is having a mitigating effect (albeit not the one we 
want),
>>> # since we're passing a unicode. Without it we get an error:
>>> urllib.urlencode({'response': u'páginas de España'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.p
y",
line 1250, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1:
ordinal not in range(128)

From your original patch, it's clear you're passing a unicode, as you were 
catching
the UnicodeEncodeError generated by something like the above. In which case, 
doseq=1
will result in characters being converted to '?' as shown, which is not what we 
want.

Original comment by jabron...@gmail.com on 30 Oct 2008 at 5:32

GoogleCodeExporter commented 8 years ago

fixed r106

Original comment by jabron...@gmail.com on 30 Oct 2008 at 6:26

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Original comment by adrian.g...@gmail.com on 30 Mar 2012 at 6:19

Changed state: Verified

Lucas89moya / recaptcha

Bug in reCAPTCHA Python library + patch #7