What do i need to add a new language ?

Hi, thanks for the interest! The answer is quite straightforward. Simply follow the instruction linked here. In the mono_ft.sh bash file, insert ar following --oscar_data_lang and input the sampling probability you intend to use after --interleave_probs. For instance, if your goal is to fine-tune the model with an equal distribution of 50% English (prevent model from forgetting English) and 50% Arabic, you would proceed as follows:

....
--oscar_data_lang en,ar \
--interleave_probs 0.5,0.5 \
....

You can replace meta-llama/Llama-2-7b-hf with haoranxu/ALMA-7B-Pretrain or haoranxu/ALMA-13B-Pretrain to begin with our models. Note that after fine-tuning, the model is not a translation model. You may still need to fine-tune the model on parallel sentences: https://github.com/fe1ixxu/ALMA#parallel-data-fine-tuning-full-weight.

fe1ixxu / ALMA

What do i need to add a new language ? #8