foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
116 stars 18 forks source link

align with FMS config #26

Closed lchu-ibm closed 4 months ago

lchu-ibm commented 4 months ago

to match with https://github.com/foundation-model-stack/foundation-model-stack/blob/b06c5dfb6093f3a422f8a5d9bcff57ac81eedf5b/fms/models/llama.py#L342-L353

lchu-ibm commented 4 months ago

@daviswer based on the hf config, they are. they also share the same kvheads. the num paramter came out was also precisely 34b.

raghukiran1224 commented 4 months ago

Llama2 34 and 70b are using gqa, which is different from llama1