Question about ALMA(R) - Githubissues

mru4913 commented 6 months ago

Hi Felix,

I am doing a temporary machine-translation task: improving LLM’s capability of translating among few languages, like ja-zh, zh-ja, zh-en, etc. I have few questions:

For stage 1, have you ever tried fine-tuning LLM with LORA instead of full-parameter tuning?
for stage 1, can you use LLM-chat model instead of base model LLM?

@fe1ixxu

fe1ixxu commented 6 months ago

Thanks for your interest! As for your questions:

Yes, we've previously attempted LoRA fine-tuning in the first stage. However, LoRA fine-tuning struggles to acquire broad multilingual knowledge because of its limited capacity. This approach might work better with a significantly larger LoRA.
The primary reason we're using the base model of LLM is to continue pre-training. This continued pre-training on the LLM-chat model is likely to diminish its previously learned chat knowledge. If we plan to train a chat-ALMA model, we should consider doing this after completing stage 1.

mru4913 commented 6 months ago

Thx for your reply.

fe1ixxu / ALMA

Question about ALMA(R) #45