There is no high-quality open-source foundation model that was designed with Azerbaijani in mind. Most multilingual models contain little or no data in Azerbaijani in their training data. ai-forever/mGPT-1.3B-azerbaijan on Hugging Face seems the only model that has been trained specifically for Azerbaijani, but that, too, is built on top of the mGPT model.
We believe that at least two different series of foundation models are necessary:
[ ] BERT-based models will be customized for classification tasks, and
[ ] GPT-based models will be customized for text generation tasks.
They have different training tasks, but we are not aware of any differences in datasets needed to train them.
There is no high-quality open-source foundation model that was designed with Azerbaijani in mind. Most multilingual models contain little or no data in Azerbaijani in their training data.
ai-forever/mGPT-1.3B-azerbaijan
on Hugging Face seems the only model that has been trained specifically for Azerbaijani, but that, too, is built on top of the mGPT model.We believe that at least two different series of foundation models are necessary: