aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.29k stars 337 forks source link

Try to run the demo, and got utf-8 error. #222

Open jarodtang opened 4 years ago

jarodtang commented 4 years ago

I tried to run the demo on Mac and got a error, how can I fix it?

from polyglot.text import Text, Word

words = ["preprocessing", "processor", "invaluable", "thankful", "crossed"]
for w in words:
  w = Word(w, language="en")
  print("{:<20}{}".format(w, w.morphemes))

and got the error as

`Traceback (most recent call last):
  File "de_word.py", line 6, in <module>
    print("{:<20}{}".format(w, w.morphemes))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/polyglot/decorators.py", line 20, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/polyglot/text.py", line 286, in morphemes
    words, score = self.morpheme_analyzer.viterbi_segment(self.string)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/polyglot/decorators.py", line 20, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/polyglot/text.py", line 282, in morpheme_analyzer
    return load_morfessor_model(lang=self.language)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/polyglot/decorators.py", line 30, in memoizer
    cache[key] = obj(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/polyglot/load.py", line 139, in load_morfessor_model
    tmp_file_.write(file_handler.read())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 23: invalid start byte`
jarodtang commented 4 years ago

fixed by rerun "polyglot download morph2.en"