foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
114 stars 18 forks source link

Support nested folders for datasets #91

Open thinkahead opened 1 month ago

thinkahead commented 1 month ago

The current code only looks for files in the dataset folder. When the dataset has additional nested folders, these arrow files are not seen

thinkahead commented 1 month ago

Pull request https://github.com/foundation-model-stack/fms-fsdp/pull/90