ffreemt / fast-langid

Detect language of a given text, fast
9 stars 2 forks source link

IndexError: list index out of range when using fastlid with single word text 'Elle' and target_languages = ["en", "zh"] #1

Closed lin-xiaosheng closed 8 months ago

lin-xiaosheng commented 8 months ago

Hello,

I encountered an issue when using the fastlid function in the fastlid module. The issue occurs when I am trying to classify the language of a single word text with multiple target languages.However, everything works fine when other texts are inputted.

Here is my code: text = 'Elle' target_languages = ["en", "zh"] elif module == "langid": import langid

classifier = langid.classify
if target_languages != None:
    target_languages = [
        lang for lang in target_languages if lang in langid_languages
    ]
    langid.set_languages(target_languages)

lang = classifier(text)[0]

The error message is as follows: Traceback (most recent call last): File "/root/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/root/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/root/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/root/miniconda3/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call raise e File "/root/miniconda3/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "/root/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/root/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/root/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 66, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/lib/python3.11/site-packages/fastapi/routing.py", line 274, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(*values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/Bert-VITS2/syn_server_fastapi.py", line 673, in synthesize return await _synthesize( ^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/Bert-VITS2/syn_server_fastapi.py", line 487, in _synthesize sentences_list = split_by_language( ^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/Bert-VITS2/tools/sentence.py", line 84, in split_by_language lang = classify_language(sentence, target_languages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/Bert-VITS2/tools/classify_language.py", line 137, in classifylanguage lang = classifier(text)[0] ^^^^^^^^^^^^^^^^ File "/root/miniconda3/lib/python3.11/site-packages/fastlid/fastlid.py", line 264, in fastlid = [map(lambda x: x[9:], lid)][0], [*map(lambda x: round(x, 3), prob)][0]

It seems that the lid or prob list is empty, causing an IndexError when the code tries to access the first element of the list. I suspect this might be due to the input text being a single word or the target languages being multiple.

Could you please help investigate this issue? I would appreciate any assistance you can provide.

ffreemt commented 8 months ago

Thanks for reporting the problem.

It's a bug.

fastext detects the text ("Elle") as (('__label__fr', '__label__de', '__label__cs', '__label__fi'), array([9.99864995e-01, 1.12115857e-04, 3.38626596e-05, 2.77091276e-05])), causing fastlid to throw an IndexError exception.

To fix the bug for target_languages = ["en", "zh"] is easy (for example using regex). But I am not quite sure how to fix the bug for any two given languages in general. I'll need some time to think about it.

In the mean time, you can try to wrap the fastlid related stuff in try...except... and handle the exception, for example:

try:
    # fastlid related stuff
    ...
except IndexError:
   # 
   ...
ffreemt commented 4 months ago

Finally took the time to fix this bug