ChristianSch / crime-stats-nlp

NLP based crime stats extraction for police reports taken from http://www.presseportal.de
MIT License
2 stars 0 forks source link

Stanford NER output fails due to `ascii` encoding #2

Open ChristianSch opened 9 years ago

ChristianSch commented 9 years ago

despite of passed utf-8 encoded tiles the NER bindings try to output ascii and this fails:

Traceback (most recent call last):
  File "index.py", line 42, in <module>
    tag = st.tag(tiles[0].encode('utf-8').split())
  File "/usr/local/lib/python2.7/site-packages/nltk/tag/stanford.py", line 59, in tag
    return self.tag_sents([tokens])[0]
  File "/usr/local/lib/python2.7/site-packages/nltk/tag/stanford.py", line 82, in tag_sents
    stanpos_output = stanpos_output.decode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)