Hello. I'm currently training my model based on the principles you've outlined.
I have a few inquiries I'd like to make.
What's the reason behind selecting the llama2 model as the foundational model? Is it possible to utilize a different model, such as qwen or mistral, among others?
Regarding CPO data, I possess a dataset comprising several thousand pairs. Is it feasible to train with this dataset while also applying it to CPO data simultaneously (My strategy includes creating synthetic data using gpt4 in conjunction with my own pre-trained model)?
Have you ever tested a bigger model exceeding 13B? I was wondering if I can use more than 30B models as well
Thanks for your interest and sorry about the delayed response!
The reason of choosing LLaMA-2 because it performs the best at zero-shot translation compared with other LLMs at the time I was doing the project. See Section 2 in the paper.
I think it should be feasible.
We have not tried ALMA recipe on a larger model, but it will be on its way!
Hello. I'm currently training my model based on the principles you've outlined.
I have a few inquiries I'd like to make.
What's the reason behind selecting the llama2 model as the foundational model? Is it possible to utilize a different model, such as qwen or mistral, among others?
Regarding CPO data, I possess a dataset comprising several thousand pairs. Is it feasible to train with this dataset while also applying it to CPO data simultaneously (My strategy includes creating synthetic data using gpt4 in conjunction with my own pre-trained model)?
Have you ever tested a bigger model exceeding 13B? I was wondering if I can use more than 30B models as well