foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
162 stars 27 forks source link

revert "raise Dynamo accumulated cache size limit" #53

Open lchu-ibm opened 6 months ago

lchu-ibm commented 6 months ago

We recently added a commit to raise Dynamo accumulated cache size limit to make compile work with large models like 70b whose num_layer is greater than default limit (64): https://github.com/foundation-model-stack/fms-fsdp/pull/45#issuecomment-2002564455.

Now this number has been officially raised in PyTorch (related PR) from 64 to 256. So we can revert this commit as the new default is enough.

I am holding the revert change as the torch PR just got merged in today's nightly, and most env does not have this "fix" yet.

This is to be revisited later once that PR is picked by most env we have, and then we can revert it.