facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.76k stars 4.71k forks source link

Improved LangID model? #1345

Open loretoparisi opened 11 months ago

loretoparisi commented 11 months ago

In the "not too much old" 2020 post related to M2M-100 MMT called "The first AI model that translates 100 languages without relying on English data" it has been allegedly reported that

As part of this effort, we created a new LASER 2.0 and improved fastText language identification, which improves the quality of mining and includes open sourced training and evaluation scripts

While LASER 2.0 (93 languages) and even LASER 3.0 have been released, which includes a new Encoder supporting over 200 languages, I'm not aware of the release of a newer version of the 176 languages FastText LangID model here.