Closed lrthorita closed 5 years ago
I found out the problem.
In the load_file
function at spellchecker/utils.py, the gzip.open
is not receiving the encoding
argument.
@barrust , please consider fixing the following line of the load_file
function:
with gzip.open(filename, mode="rt", encoding=encoding) as fobj:
Unfortunately, I am not seeing this issue on my mac or linux boxes in either python 3 or python 2.7 so this may be a windows specific issue.
I am looking at forcing the encoding to the gzip.open function but that does not work in python 2.7.
Could you provide the stack trace? That may help me find a workaround.
Thanks!
There is a PR that should resolve your issue. Can you test? It is the hotfix/gzip-encoding branch
Hi @barrust! Thanks for answering.
Indeed, it seems this problem occurs only on Windows. However, I tried to debug the problem and changed that line I've suggested. It worked.
I'll test it.
@barrust, I tested the branch you asked. It is working.
SpellChecker is working for any supported languages, except for Portuguese ('pt'). When I try using
spell = SpellChecker('pt')
, an error message appears saying:UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 587200: character maps to <undefined>
I've also tried to load the dictionary the other way around:
spell = SpellChecker(language=None)
spell.word_frequency.load_dictionary('path/to/pt.json.gz', encoding=u'utf-8')
The same error occurs.
I'm using Python 3.6.8 in Windows 10.