Consider integrating selective activation checkpointing, as featured in PyTorch's blog "Maximizing Training Throughput", into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training of larger models.
Consider integrating selective activation checkpointing, as featured in PyTorch's blog "Maximizing Training Throughput", into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training of larger models.