combine FSDP with selective activation checkpointing

Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

https://lightning.ai

Apache License 2.0

6.85k stars 726 forks source link

combine FSDP with selective activation checkpointing #1366

Open nemoramo opened 2 weeks ago

nemoramo commented 2 weeks ago

Consider integrating selective activation checkpointing, as featured in PyTorch's blog "Maximizing Training Throughput", into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training of larger models.