Closed alvations closed 8 months ago
Hi, Thanks for your interest and sorry about the delayed response.
mono_ft.sh
AutoModelForCausalLM.from_config(config)
https://github.com/fe1ixxu/ALMA/blob/b92304c6548b7d0af0cdadca9d63c07c70d19cd5/utils/utils.py#L364Thank you for the clarification!!
Thank you for sharing the ALMA / ALMA-R models and fine-tuning scripts!
Hypothetically, if we want to replicate the ALMA experiments by training it from scratch, how can we do it?
We have some questions on how we can train an ALMA model from scratch: 1) Which monolingual datasets were used to train the original ALMA model? 2) Which bitext datasets were used LORA fine-tuning? 3) Were any of the previous WMT datasets used in the released ALMA / ALMA-R models? esp. dev/test sets 4) When we use
mono_ft.sh
, the default is model is set toAutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf')
. In that case, if we were to train a ALMA from scratch, we'll don't really have to change anything inmono_ft.sh
other than the data from question (1), is that correct? 5) And if we want to totally train even the base llama model from scratch, we'll have to do something like https://discuss.huggingface.co/t/how-does-one-reinitialize-the-weights-of-a-hugging-face-llama-v2-model-the-official-way-as-the-original-model/62547, is that right?Thank you in advance for the answers!