LLaMa and Mistral 7B pretraining support

epfLLM / Megatron-LLM

distributed trainer for LLMs

Other

504 stars 73 forks source link

LLaMa and Mistral 7B pretraining support #91

Closed StephennFernandes closed 6 months ago

StephennFernandes commented 7 months ago

Hey there, I did read the docs and found LLaMa fine-tuning scripts. I was wondering if there is a way to pretrain LLaMa and Mistral Models from scratch ?

Please let me know if it's possible.

Thanks

martinjaggi commented 6 months ago

for llama2 you should be able to directly pretrain using the same scripts as we provide for finetuning. the learning rate scheduler we used for example in meditron is the same as for pretraining. for mistral, we haven't tried but same logic should apply. please let us know how it goes or if you encounter any issues

martinjaggi commented 6 months ago

here are the two sets of hyperparameters i'm referring to, sorry if the previous message was not very clear:

for 7B: https://github.com/epfLLM/meditron/tree/main?tab=readme-ov-file#training-hyperparameters-7b
for 70B: https://github.com/epfLLM/meditron/tree/main?tab=readme-ov-file#training-hyperparameters-70b