IndigoResearch / textteaser

Official version of TextTeaser.
MIT License
620 stars 143 forks source link

Bug fix #6

Closed plean closed 7 years ago

plean commented 7 years ago

I was having this error without .decode('utf-8') :

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    sentences = tt.summarize(title, text)
  File "/home/planch_j/Téléchargements/edit_textteser/textteaser/__init__.py", line 13, in summarize
    result = self.summarizer.summarize(text, title, source, category)
  File "/home/planch_j/Téléchargements/edit_textteser/textteaser/summarizer.py", line 11, in summarize
    sentences = self.parser.splitSentences(text)
  File "/home/planch_j/Téléchargements/edit_textteser/textteaser/parser.py", line 61, in splitSentences
    tokenizer = nltk.data.load('file:' + os.path.dirname(os.path.abspath(__file__)) + '/trainer/english.pickle')
  File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 769, in load
    resource_url = normalize_resource_url(resource_url)
  File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 186, in normalize_resource_url
    name = normalize_resource_name(name, True)
  File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 232, in normalize_resource_name
    resource_name = resource_name.replace('\\', '/').replace(os.path.sep, '/')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128)