Closed mru4913 closed 5 months ago
Thanks for your interest! As for your questions:
Yes, we've previously attempted LoRA fine-tuning in the first stage. However, LoRA fine-tuning struggles to acquire broad multilingual knowledge because of its limited capacity. This approach might work better with a significantly larger LoRA.
The primary reason we're using the base model of LLM is to continue pre-training. This continued pre-training on the LLM-chat model is likely to diminish its previously learned chat knowledge. If we plan to train a chat-ALMA model, we should consider doing this after completing stage 1.
Thx for your reply.
Hi Felix,
I am doing a temporary machine-translation task: improving LLM’s capability of translating among few languages, like ja-zh, zh-ja, zh-en, etc. I have few questions:
@fe1ixxu