elyase / geotext

Geotext extracts country and city mentions from text
MIT License
133 stars 46 forks source link

UnicodeDecodeError with Python 3 on Window #3

Open shinstar123 opened 7 years ago

shinstar123 commented 7 years ago

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to

elyase commented 7 years ago

What was the text fragment that triggered the error?

shinstar123 commented 7 years ago

i just ran this example from geotext import GeoText

places = GeoText("London is a great city") places.cities

GeoText('New York, Texas, and also China').country_mentions

and that issue comeout

elyase commented 7 years ago

Unfortunately I can't reproduce the issue, can you try installing in a fresh environment?

snippsat commented 7 years ago

My fix for someone else that had this problem. Run with error: http://pastebin.com/d0N7Q9cZ Fix: with open(filename, 'r') as f: To: with open(filename, 'r', encoding='utf-8') as f: Test:

(geo_test) C:\Python36\geo_test
λ python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from geotext import GeoText
>>> places = GeoText("London is a great city")
>>> places.cities
['London']

>>> GeoText('New York, Texas, and also China').country_mentions
OrderedDict([('US', 2), ('CN', 1)])

>>> places = GeoText("Oslo is a great city")
>>> places.cities
['Oslo']
iShekhar commented 6 years ago

Error still seen when installing and running on Python 3.4


import geotext
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm 2017.2.4\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Python34\lib\site-packages\geotext\__init__.py", line 7, in <module>
    from .geotext import GeoText
  File "C:\Program Files\JetBrains\PyCharm 2017.2.4\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Python34\lib\site-packages\geotext\geotext.py", line 87, in <module>
    class GeoText(object):
  File "C:\Python34\lib\site-packages\geotext\geotext.py", line 103, in GeoText
    index = build_index()
  File "C:\Python34\lib\site-packages\geotext\geotext.py", line 77, in build_index
    cities = read_table(get_data_path('cities15000.txt'), usecols=[1, 8])
  File "C:\Python34\lib\site-packages\geotext\geotext.py", line 54, in read_table
    for line in lines:
  File "C:\Python34\lib\site-packages\geotext\geotext.py", line 51, in <genexpr>
    lines = (line for line in f if not line.startswith(comment))
  File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined>```
dovinmu commented 6 years ago

I'm getting the same error just trying to import geotext, Python 3.6 on Windows 10 with Anaconda. Specifically this error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined> when the IncrementalDecoder tries to open the cities csv.

dovinmu commented 6 years ago

Fixed the problem by using Linux instead of Windows.

iShekhar commented 6 years ago

The question is to solve the issue on WINDOWS!!

dovinmu commented 6 years ago

I may have been being slightly snarky.

elyase commented 6 years ago

Can someone on Windows try again on master:

pip install https://github.com/elyase/geotext/archive/master.zip

?

Ala1s commented 6 years ago

Tried it, didn't solve the problem snippsat's solution worked for me

tschlach commented 6 years ago

Having the same problem during the module import

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined>

iwpnd commented 6 years ago

@tschlach and snippsats solution did not work?

tschlach commented 6 years ago

@iwpnd It seems like snippsats solution suggests that the UnicodeDecodeError results from reading a text file without specifying the encoding.

I think the error that most of us are encounter comes on importing the library.

Here's my complete traceback of the error:

Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import geotext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\site-packages\geotext\__init__.py", line 7, in <module>
    from .geotext import GeoText
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\site-packages\geotext\geotext.py", line 87, in <module>
    class GeoText(object):
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\site-packages\geotext\geotext.py", line 103, in GeoText
    index = build_index()
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\site-packages\geotext\geotext.py", line 77, in build_index
    cities = read_table(get_data_path('cities15000.txt'), usecols=[1, 8])
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\site-packages\geotext\geotext.py", line 54, in read_table
    for line in lines:
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\site-packages\geotext\geotext.py", line 51, in <genexpr>
    lines = (line for line in f if not line.startswith(comment))
  File "C:\Users\tschlachter\AppData\Local\Continuum\Miniconda3\envs\language\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined>
iwpnd commented 6 years ago

@tschlach Yes, and if you look closely you see that read_table() is executed which is reading a text.

@elyase I just created a clean venv and installed geotext via pip install https://github.com/elyase/geotext/archive/master.zip

on a windows machine with python 3.6 and everything works as expected.

tschlach commented 6 years ago

@iwpnd

Ah - works perfectly with a fresh install, thanks for the help.

CurtLH commented 6 years ago

I'm trying to build this package for conda-forge, but the build is failing on Windows for the same reason mentioned here.

iwpnd commented 6 years ago

@CurtLH have you done anything mentioned in this issue to fix the problem?

CurtLH commented 6 years ago

@iwpnd -- It seems that the issue has been fixed with this PR but a new version has not yet been uploaded to PyPI or tagged on GitHub.

If you're not familiar with the process at Conda-Forge, recipes should be build from tarballs, not repos. So for now, I've added the Linux and OSX versions of the packages, and as soon as a new release is created, I will add Windows to the Conda-Forge recipe.

iwpnd commented 6 years ago

@CurtLH I see, thanks for the enlightenment :)

VanessaVanG commented 6 years ago

pip install wasn't working for me so I had to do easy_install https://github.com/elyase/geotext/archive/master.zip (or you could do pip install git+https://github.com/elyase/geotext.git)

Everything is working now. Windows, Python 3.6