gambolputty / german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Creative Commons Attribution Share Alike 4.0 International
145 stars 19 forks source link

UnicodeDecodeError on Windows with Python 3.9 #13

Open xcnox opened 1 year ago

xcnox commented 1 year ago

I was testing the german-nouns module today with the example from the Readme.md and ran into the following error: Traceback (most recent call last): File "c:\Users\...\Documents\...\....py", line 3, in <module> nouns = Nouns() File "C:\Users\...\AppData\Local\Programs\Python\Python39\lib\site-packages\german_nouns\lookup\__init__.py", line 23, in __init__ data = list(csv.reader(open(CSV_FILE_PATH))) File "C:\Users\...\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4372: character maps to <undefined>

After editing "C:\Users\...\AppData\Local\Programs\Python\Python39\lib\site-packages\german_nouns\lookup\__init__.py" in line 23 and changing from data = list(csv.reader(open(CSV_FILE_PATH))) to data = list(csv.reader(open(CSV_FILE_PATH, encoding='utf-8'))) it worked fine.

I'm using Python 3.9 on a Windows 10 x64 machine.

wrjenny commented 1 year ago

Encountered the same issue but workaround above helped just to test the package, thanks. Ideally, the issue should be fixed in the package itself.