centralize place for defining block

foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.

https://pytorch.org/docs/stable/fsdp.html

Apache License 2.0

114 stars 18 forks source link

centralize place for defining block #68

Closed lchu-ibm closed 3 months ago

lchu-ibm commented 3 months ago

Currently transformer block is defined and used in a few places including ac_handler module and fsdp_wrapper module. This PR will centralize these into main (where the model is defined) so it is easier to switch from one model to another.