appliedsec / pygeoip

DEPRECATED: Pure Python API for Maxmind's binary GeoIP databases
https://github.com/appliedsec/pygeoip
GNU Lesser General Public License v3.0
481 stars 111 forks source link

Unicode encoded data returned as ascii encoded string. #1

Closed jennifere closed 12 years ago

jennifere commented 12 years ago

Reproduction Steps:

  1. Run some code which fetches a GeoIP entry with unicode characters:

gi = pygeoip.GeoIP('MyPathToGeoIPOrganizationFile') org = gi.org_by_addr('67.215.9.186') sorg = u'org = %s' % org

Expected output:

org = Tirabout François

Actual Output:

<type 'exceptions.UnicodeDecodeError'> 'ascii' codec can't decode byte 0xc3 in position 13: ordinal not in range(128)

Versions:

OS: Windows Vista Python: 2.7.2 pygeoip: 2.2

Workaround:

Force pygeoip to use unicode encoded strings as follows:

  1. Create file in [PYTHON]\Lib\site-packages\sitecustomize.py with the following contents:

import sys sys.setdefaultencoding('utf8')

chrish42 commented 12 years ago

At least in the free GeoLiteCity.dat file, the encoding used is Latin-1, not UTF-8. I convert all the str values in the dict returned buy the GeoIP methods to Unicode, but this is something that pygeoip should be doing on its own.

gglockner commented 12 years ago

I'm seeing the same issue; I believe it could be fixed in the source by opening _filehandle via codecs.open() rather than using the standard open function, which returns a string.

I haven't tested this myself. Currently, I'm doing what chrish42 is doing: manually convert the result to Unicode.

tiwilliam commented 12 years ago

Parts of this commit was reverted due to other issues (see issue #16). We will have to look at this again to make sure it works.