ECP-CANDLE / Foundation

MIT License
1 stars 0 forks source link

Apply activation checkpointing to HF models #3

Closed azton closed 1 year ago

azton commented 1 year ago

Short of copying in every HF model (no), we need to change the model initialization to wrap transformer blocks in activation checkpoints. This seems like it should work the same way as the examples for FSDP.