How to use Language Identification Model of NLLB ?

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.15k stars 6.37k forks source link

How to use Language Identification Model of NLLB ? #4956

Open rcv-koo opened 1 year ago

rcv-koo commented 1 year ago

What is your question?

How to use the language identification model trained on Flores-200 (mentioned in the NLLB paper) ? Model is presented in the repo but the utility of the LID model via code is nowhere to be found.

Also, is there a hf implementation for this ?

julien-c commented 1 year ago

(might also be of interest to @sheonhan)

WilliamTambellini commented 1 year ago

fvolchyok commented 10 months ago

Very late, but for anyone interested – assuming you're asking about lid218e.bin model - you could use fasttext library:

import fasttext
fasttext_model = fasttext.load_model('lid218e.bin')
fasttext_model.predict("русский язык", k=3)

outputs:

(('__label__rus_Cyrl', '__label__tat_Cyrl', '__label__ukr_Cyrl'),
 array([9.72893476e-01, 2.59862132e-02, 4.44931240e-04]))