Closed MohamedAliRashad closed 1 year ago
Hi, thanks for the interest! The answer is quite straightforward. Simply follow the instruction linked here. In the mono_ft.sh bash file, insert ar
following --oscar_data_lang and input the sampling probability you intend to use after --interleave_probs
. For instance, if your goal is to fine-tune the model with an equal distribution of 50% English (prevent model from forgetting English) and 50% Arabic, you would proceed as follows:
....
--oscar_data_lang en,ar \
--interleave_probs 0.5,0.5 \
....
You can replace meta-llama/Llama-2-7b-hf
with haoranxu/ALMA-7B-Pretrain
or haoranxu/ALMA-13B-Pretrain
to begin with our models. Note that after fine-tuning, the model is not a translation model. You may still need to fine-tune the model on parallel sentences: https://github.com/fe1ixxu/ALMA#parallel-data-fine-tuning-full-weight.
First of all, thank you for this great project. My question is simple what do i need to make
ALMA
Learn a new language (Arabic).