danielpronych / python-twitter

Automatically exported from code.google.com/p/python-twitter
Apache License 2.0
0 stars 0 forks source link

Unicode problem #210

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.
searchterm = 'ð'
twitterApi = twitter.Api()
searchterm = unicode(searchterm, 'utf-8').encode('utf-8')
2. UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)

What is the expected output? What do you see instead?
I expect the twitter API search to work.

What version of the product are you using? On what operating system?
Latest version, Mac OS X

Please provide any additional information below.

I was digging into the code and the problem is in this function:

def _Encode(self, s):
    if self._input_encoding:
      return unicode(s, self._input_encoding).encode('utf-8')
    else:
      return unicode(s).encode('utf-8')

Note that s is already a unicode string with non-ascii characters. So since 
_input_encoding is not defined, unicode(s).encode('utf-8') is executed, and 
this fails. If this line was instead unicode(s, 'utf-8').encode('utf-8') it 
would work.

If instead I pass in input_encoding = 'utf-8' to the twitter.Api() initializer, 
it fails again. This time it's because it's trying to encode the other 
parameters in the search. When it gets to page number, which in our example is 
the number 1, the following line would throw an exception:

unicode(1, 'utf-8').encode('utf-8') because 1 is an integer and not a string
unicode(1).encode('utf-8') would work, but we can't use this because of the 
previous error.

Original issue reported on code.google.com by delso...@gmail.com on 26 Sep 2011 at 9:44