fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
330 stars 23 forks source link

Question about ALMA(R) #45

Open mru4913 opened 2 weeks ago

mru4913 commented 2 weeks ago

Hi Felix,

I am doing a temporary machine-translation task: improving LLM’s capability of translating among few languages, like ja-zh, zh-ja, zh-en, etc. I have few questions:

@fe1ixxu

fe1ixxu commented 2 weeks ago

Thanks for your interest! As for your questions:

  1. Yes, we've previously attempted LoRA fine-tuning in the first stage. However, LoRA fine-tuning struggles to acquire broad multilingual knowledge because of its limited capacity. This approach might work better with a significantly larger LoRA.

  2. The primary reason we're using the base model of LLM is to continue pre-training. This continued pre-training on the LLM-chat model is likely to diminish its previously learned chat knowledge. If we plan to train a chat-ALMA model, we should consider doing this after completing stage 1.

mru4913 commented 1 week ago

Thx for your reply.