foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
162 stars 27 forks source link

Enable torch.compile support #45

Closed lchu-ibm closed 6 months ago

lchu-ibm commented 6 months ago

We turned off torch.compile support a while ago due to 1. compile accuracy issue 2. graph break when compiling rope

Now as both are fixed, we should turn this back on to support compile.

Initial experiments shows consistent loss curve over different runs (non-compile vs. compile-with-ac vs. compile-without-ac vs. compile-with-selective-ac).

image
lchu-ibm commented 6 months ago

We also need to make accumulated_cache_size_limit higher in order to make 70b model compile-able, otherwise it will throw torch._dynamo hit config.accumulated_cache_size_limit (64) and break graph compile.

Fixed in https://github.com/foundation-model-stack/fms-fsdp/pull/45/commits/8e0bfa16aa6eb0e48f53704bc7d628b4f7108c9c

Related: https://github.com/pytorch/pytorch/issues/114511