foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
114 stars 18 forks source link

add llama3 8b config #76

Closed lchu-ibm closed 2 months ago

lchu-ibm commented 2 months ago

Add llama3 8b config.

we also expose vocab_size to make our dummy dataloader configurable with llama3 so we can overwrite default 32k to 128k.