elyase / geotext

Geotext extracts country and city mentions from text
MIT License
135 stars 48 forks source link

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence #9

Closed df19900725 closed 6 years ago

df19900725 commented 7 years ago

from geotext import GeoText places = GeoText("London is a great city") places.cities GeoText('New York, Texas, and also China').country_mentions

My computer system is Windows 10... The code fragment is mentioned above. Then it throws an error:

"D:\Program Files\Python3\python.exe" D:/OneDrive/Programs/Jieba/ExtractLocation.py Traceback (most recent call last):

File "D:/OneDrive/Programs/Jieba/ExtractLocation.py", line 20, in from geotext import GeoText

File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext__init__.py", line 7, in from .geotext import GeoText File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 87, in class GeoText(object): File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 103, in GeoText index = build_index() File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 74, in build_index get_data_path('countryInfo.txt'), usecols=[4, 0], skip=1) File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 48, in read_table next(f) UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

DevinCharles commented 7 years ago

I believe this issues is the same that I'm seeing... geotext\read_table has an input encoding but this is never used during the file open() command.

At Line 45:

with open(filename, 'r') as f:
    # skip initial lines
    for _ in range(skip):
        next(f)

Should be:

with open(filename, 'rt', encoding=encoding) as f:
    # skip initial lines
    for _ in range(skip):
        next(f)
astrocrazy commented 6 years ago

Worked like a charm, Thank you so much...

DevinCharles commented 6 years ago

Unfortunately this doesn't solve all issues related to this, but it works in a pinch...

elyase commented 6 years ago

Should be fixed in master:

pip install https://github.com/elyase/geotext/archive/master.zip

thanks for reporting and thanks to @DevinCharles for the help.