Open jingkang99 opened 7 months ago
Had the same problem with the new model - not detecting text in Chinese (and some other languages using extended Unicode). Was wondering if the new model expects input in a specific encoding (tried with UTF-8, which works only with previous model).
It is also not detecting Chinese even on the huggingface demo: https://huggingface.co/facebook/fasttext-language-identification
Found another issue for the same problem: https://github.com/facebookresearch/fairseq/issues/5325
Did you find any workaround @jingkang99?
🐛 Bug
Compared to lid.176.bin, the new model isn't better. Where is the training data? Thx
To Reproduce
Expected behavior
$${\color{red}wrong-output}$$
lang_model_large = "lid.176.bin" detected correctly $${\color{green}correct-output}$$
Environment
colab