finetuning for a specific language

I'm pretty novice in the llm space, but found that X-ALMA provides good translation and has the potential for translating subtitles as it is based on llama2, so it understand the context better. I was wondering if there is a way to fine-tune the model for English to Arabic translation, I have a good dataset of high-quality parallel subtitles (Approx. 350 thousands rows), and was looking for a way to fine-tune the model for English to Arabic and vice-versa.

I was also wondering about the improvement in quality or possibility of using a newer llm (e.g. llama3.1). Is this possible? Are the training data available to try this out?

fe1ixxu / ALMA

finetuning for a specific language #62