HamzaHawk / guess-language

Automatically exported from code.google.com/p/guess-language
GNU Lesser General Public License v2.1
1 stars 0 forks source link

Exception for Unicode chars > 0xFFFF #11

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
Unicode symbols from extended charset (ord(c) > 0xffff) cause exception.

Traceback (most recent call last):
  File "describe-channels.py", line 20, in <module>
    lang = guess_language.guessLanguage(" ".join(row.get('text', [])))
  File "/usr/local/lib/python2.6/dist-packages/guess_language/guess_language.py", line 300, in guessLanguage
    return _identify(text, find_runs(text))
  File "/usr/local/lib/python2.6/dist-packages/guess_language/guess_language.py", line 352, in find_runs
    block = unicodeBlock(c)
  File "/usr/local/lib/python2.6/dist-packages/guess_language/blocks.py", line 64, in unicodeBlock
    return _names[ix]
IndexError: list index out of range

Original issue reported on code.google.com by vale...@adbeat.com on 17 Sep 2012 at 8:02

GoogleCodeExporter commented 8 years ago

Hi,

As mentionned on the main page, this package is no longer maintained. Please 
report any issues to my forked version: 
https://bitbucket.org/spirit/guess_language

Although my version is a Python 3 port, I try to also support Python 2 if it's 
not too hard.

That being said, I believe my version is not affected by this issue.

Original comment by hiddensp...@gmail.com on 25 Sep 2012 at 9:32