digsy89 / parasol

Korean tokenizer with character decompositions.
Apache License 2.0
3 stars 1 forks source link

OSError: Not found: when loading bpe.model #1

Closed Eugen2525 closed 4 years ago

Eugen2525 commented 4 years ago

Hi I got the below error when using the tokenizer:

from parasol import Tokenizer
t2 = Tokenizer(decompose=True)

error:

Traceback (most recent call last):
  File "D:/test_korean_tokenizer.py", line 13, in <module>
    t2 = Tokenizer(decompose=True)
  File "C:\..\AppData\Local\Programs\Python\Python37\lib\site-packages\parasol\tokenize.py", line 31, in __init__
    self.spp.load(model.as_posix())
  File "C:\..\AppData\Local\Programs\Python\Python37\lib\site-packages\sentencepiece.py", line 214, in load
    return _sentencepiece.SentencePieceProcessor_load(self, filename)
OSError: Not found: "C:/../AppData/Local/Programs/Python/Python37/lib/site-packages/parasol/resources/decomposed/bpe.model": Illegal byte sequence Error #42
Eugen2525 commented 4 years ago

ok, resovled it. the problem was that the path contains hangeul characters.