Lack of documentation to use LID in https://github.com/facebookresearch/fairseq/tree/nllb

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.58k stars 6.41k forks source link

Lack of documentation to use LID in https://github.com/facebookresearch/fairseq/tree/nllb #5072

Open aloka-fernando opened 1 year ago

aloka-fernando commented 1 year ago

📚 Documentation

In the NLLB-200 documentation, https://github.com/facebookresearch/fairseq/tree/nllb there is no details on how to use the LID model (ie nllblid218e) for language ID prediction. Could this be shared please?

Thanks!

ChocoL0rd commented 2 months ago

Let me know when you get the answer.

aloka-fernando commented 2 months ago

@ChocoL0rd You can refer the below code.

#LID model
model = fasttext.load_model("/path/to/model/nllblid218e")

#predict LID
def get_lid(text):

    try:
        predictions = model.predict(text, k=1)
        lang_code = predictions[0][0].strip().split('__')[-1]
        prob = predictions[1][0]

        #return lang_code, prob
        return lang_code
    except:
        return "UNK"