How to use the NLLB Language identification .bin model?

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.58k stars 6.41k forks source link

How to use the NLLB Language identification .bin model? #4592

Closed ArturPrzybysz closed 2 years ago

ArturPrzybysz commented 2 years ago

Hi! Thank you for your great work and making it publicly available!

I am trying to use your NLLB model and thanks to the huggingface integration it is easy to do. However, you have also published the LID model's .bin file and I am struggling to use it.

Can you provide a simple example of its usage? What is the .bin file?

Celebio commented 2 years ago

hi @ArturPrzybysz , you can use the model using fastText.

A simple command looks like this: ./fasttext predict-prob lid218e.bin mytestfile.txt where mytestfile.txt contains one sentence per line to be predicted.

More information on how to use fastText available here.

Regards, Onur

ArturPrzybysz commented 2 years ago

@Celebio Thank you for the help! I managed to make it work in code thanks to you.

baiziyuandyufei commented 2 years ago

where is the lid218e.bin？

ArturPrzybysz commented 2 years ago

@baiziyuandyufei It's here: https://github.com/facebookresearch/fairseq/tree/nllb#lid-model

fatihbeyhan commented 2 years ago

Hi,

I suppose this is the language identification model which was used in NLLB paper. Can you please tell me where is the fasttext checkpoint (.bin) for this LID model? This link seems to not take you anywhere: https://github.com/facebookresearch/fairseq/tree/nllb#lid-model