glample / tagger

Named Entity Recognition Tool
Apache License 2.0
1.16k stars 426 forks source link

Unicode #45

Closed mahsash closed 6 years ago

mahsash commented 7 years ago

Hi. I have some problems during training my own model on Persian dataset. It gave me error at the beginning of training phase. My dataset is in UTF-8 format. Does Glample support utf-8? If yes, what else can be the problem? My dataset is in CONLL2003 format. The Error: "file", line 43, in update_tag_scheme 'Please check sentence %i:\n%s' % (i, s_str)) Exception: <exception str() failed> "


dungtn commented 7 years ago

You might need to change the encoding scheme in from 'utf8' to your string encoding format e.g., I used 'latin-1' for Spanish and German.

Rabia-Noureen commented 7 years ago

Hi Sir i am also having the same issue with English Data set. My data set stanfordSentimentTreebank is encoded in UTF-8 and i am using GoogleNews Pretrained Word embedding that is a .gz file.... Kindly guide me as i am stuck with this error.

Rabia-Noureen commented 7 years ago

@dungtn can you please help me solving the issue?