Mimino666 / langdetect

Port of Google's language-detection library to Python.
Other
1.72k stars 198 forks source link

UnicodeDecodeError #25

Closed mathskyit closed 7 years ago

mathskyit commented 7 years ago

I test this code by IDLE. my code file is encoded by UTF-8:

from langdetect import detect_langs

print detect_langs('こんにちは')

And got an error:

Traceback (most recent call last):
  File "C:\Users\TUAN\Desktop\test.py", line 3, in <module>
    print detect_langs('こんにちは')
  File "C:\Python27\lib\site-packages\langdetect\detector_factory.py", line 137, in detect_langs
    return detector.get_probabilities()
  File "C:\Python27\lib\site-packages\langdetect\detector.py", line 143, in get_probabilities
    self._detect_block()
  File "C:\Python27\lib\site-packages\langdetect\detector.py", line 147, in _detect_block
    self.cleaning_text()
  File "C:\Python27\lib\site-packages\langdetect\detector.py", line 122, in cleaning_text
    elif ch >= six.u('\u0300') and unicode_block(ch) != 'Latin Extended Additional':
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
Mimino666 commented 7 years ago

I see you are using Python 2.7. In that case, your code should be:

from langdetect import detect_langs

print detect_langs(u'こんにちは')  # unicode string should be written as u'...'
mathskyit commented 7 years ago

Oh!. Thank you.