cnorthwood / ternip

Temporal Expression Recognition and Normalisation in Python
Other
78 stars 17 forks source link

Output doesn't handle non-ascii gracefully #14

Open leondz opened 12 years ago

leondz commented 12 years ago

From TAC_2010_KBP_Source_Data/data/2010/wb/eng-WL-11-174596-12957493.sgm (http://pastebin.com/Wz2QKEAZ):

Traceback (most recent call last): File "/usr/local/bin/annotate_timex", line 154, in print str(doc) UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 662: ordinal not in range(128)

cnorthwood commented 12 years ago

I think this may be down to the encoding of the terminal being ASCII only? Not entirely sure... Bugfixes may be to (re?)open stdout in utf-8 mode if it can, will investigate a bit more when I have time

leondz commented 12 years ago

Well, if you feel like it! I'm putting these up partly as a note to myself to fix them - it just seems like the best place to keep bug reports

cnorthwood commented 12 years ago

largely just throwing my own thoughts out there too tbh :)