foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
116 stars 18 forks source link

fix dummy dataloader for a larger simulated vocab size #25

Closed lchu-ibm closed 4 months ago

lchu-ibm commented 4 months ago

The current dummy dataloader used for perf benchmarking has a simulated vocab size of 10k.

we should change this to 32k as somehow 10k caused some unknown corner case errors.