proposed fix for UnicodeEncodeError: explicit utf-8 encode during stanfo...

NLeSC / xtas

Distributed text analysis suite based on Celery

http://nlesc.github.io/xtas/

Other

94 stars 32 forks source link

proposed fix for UnicodeEncodeError: explicit utf-8 encode during stanfo... #51

Closed nusselder closed 10 years ago

nusselder commented 10 years ago

...rd_ner

When data in elasticsearch contained a unicode character, a UnicodeEncodeError was thrown. nltk tokenised text is now explicitly encoded as utf-8 before sending the text to stanford ner.

larsmans commented 10 years ago

Doesn't work: your input is simply not UTF-8-encoded. I'll add unidecode to fetch to make it guess the encoding.

larsmans commented 10 years ago

I take that back. 34e02050c19065ef4d08adb720342c5a2a63384f works on my box, but not on Travis.