Mimino666 / langdetect

Port of Google's language-detection library to Python.
Other
1.72k stars 198 forks source link

got LangDetectException for Arabic Presentation Forms #22

Open mbande opened 8 years ago

mbande commented 8 years ago

for both Arabic_Presentation_Forms-A and Arabic_Presentation_Forms-B characters, detect function throws exception:

>> detect('ﺽ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector_factory.py", line 130, in detect
    return detector.detect()
  File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector.py", line 135, in detect
    probabilities = self.get_probabilities()
  File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector.py", line 142, in get_probabilities
    self._detect_block()
  File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector.py", line 149, in _detect_block
    raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
langdetect.lang_detect_exception.LangDetectException: No features in text.
stoufa commented 6 years ago

I have the same issue ! You have to add u before the string detect(u'ض') but what if the string is given by the user ?! how to deal with such use case ?

stoufa commented 6 years ago

After few searches I've found this Stackoverflow post, you have to add these 3 lines of codes in your script import sys reload(sys) sys.setdefaultencoding('utf-8')

and convert the text to unicode inputText = unicode(inputText)