foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
162 stars 27 forks source link

add llama3 1b version #77

Closed lchu-ibm closed 5 months ago

lchu-ibm commented 5 months ago

For ablation study, we added a small version of llama3 (1.8b) which corresponds to the old 1.4b version in llama2.