disqus / python-phabricator

Python bindings for Phabricator
Apache License 2.0
159 stars 66 forks source link

latin1 vs unicode; how is best way to dealing with unknown international list #20

Closed joggerjoel closed 9 years ago

joggerjoel commented 9 years ago

Input: Quéru

`UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9: invalid continuation byte

mattrobenolt commented 9 years ago

Can you post a full stacktrace please?

joggerjoel commented 9 years ago
maniphestTitle="約"
result=self.phab.maniphest.createtask(ownerPHID=ownerPHID, title=maniphestTitle)

==> works

maniphestTitle="é"
result=self.phab.maniphest.createtask(ownerPHID=ownerPHID, title=maniphestTitle)
  File "C:\Python27\lib\site-packages\phabricator-0.4.0-py2.7.egg\phabricator\__init__.py", line 218, in __call__
    return self._request(**kwargs)
  File "C:\Python27\lib\site-packages\phabricator-0.4.0-py2.7.egg\phabricator\__init__.py", line 269, in _request
    "params": json.dumps(kwargs),
  File "C:\Python27\lib\json\__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "C:\Python27\lib\json\encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 0: unexpected end of data
joggerjoel commented 9 years ago

I did more research and found explanation on this issue: http://www.4byte.cn/question/185336/latin-1-vs-unicode-in-python.html. Now need to figure out what to do when context is unknown.

joggerjoel commented 9 years ago

traced original source data and it is unrelated to python-phabricator. Initially all was coming back as unicode. So I had to add mysql options (charset='utf8', use_unicode=True) inside the Mysqldb.connect() and now works perfectly