fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
439 stars 35 forks source link

finetuning for a specific language #62

Closed jeff11-1-1 closed 2 weeks ago

jeff11-1-1 commented 1 month ago

I'm pretty novice in the llm space, but found that X-ALMA provides good translation and has the potential for translating subtitles as it is based on llama2, so it understand the context better. I was wondering if there is a way to fine-tune the model for English to Arabic translation, I have a good dataset of high-quality parallel subtitles (Approx. 350 thousands rows), and was looking for a way to fine-tune the model for English to Arabic and vice-versa.

I was also wondering about the improvement in quality or possibility of using a newer llm (e.g. llama3.1). Is this possible? Are the training data available to try this out?

fe1ixxu commented 2 weeks ago

Thanks for asking! I guess now you can find answers in the XALMA paper :)