how can corenlp handle non-ascii string?

dasmith / stanford-corenlp-python

Python wrapper for Stanford CoreNLP tools v3.4.1

GNU General Public License v2.0

610 stars 229 forks source link

how can corenlp handle non-ascii string? #31

Open wxbks opened 9 years ago

wxbks commented 9 years ago

I put the word 'Víctor' into corenlp.parse. 'Víctor' contains non-ascii character. I would like to get the lemma of 'Víctor'. But when I put corenlp.parse('Víctor'). It gives error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128).

How can I change corenlp setting, so corenlp can handle non-ascii string?

cuzzo commented 8 years ago

Hey @sicongkuang,

You could try using something like unidecode on your input string firstt. At least, I ran into a similar error and that fixed the problem.

Hope it works for you.

Cheers,

wxbks commented 8 years ago

Hey @cuzzo , Thank you so much! Yes, that solved my problem. Thank you!