Enable torch.compile support

foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.

https://pytorch.org/docs/stable/fsdp.html

Apache License 2.0

162 stars 27 forks source link

Enable torch.compile support #45

Closed lchu-ibm closed 6 months ago

lchu-ibm commented 6 months ago

We turned off torch.compile support a while ago due to 1. compile accuracy issue 2. graph break when compiling rope

Now as both are fixed, we should turn this back on to support compile.

Initial experiments shows consistent loss curve over different runs (non-compile vs. compile-with-ac vs. compile-without-ac vs. compile-with-selective-ac).

lchu-ibm commented 6 months ago

We also need to make accumulated_cache_size_limit higher in order to make 70b model compile-able, otherwise it will throw torch._dynamo hit config.accumulated_cache_size_limit (64) and break graph compile.

Fixed in https://github.com/foundation-model-stack/fms-fsdp/pull/45/commits/8e0bfa16aa6eb0e48f53704bc7d628b4f7108c9c