Add activation checkpointing to litGPT benchmarking script

Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Apache License 2.0

1.17k stars 77 forks source link

🚀 Feature

Add activation checkpointing to benchmark_litgpt script.

Motivation

LitGPT models like: Mistral-7B-v0.2, vicuna-13b-v1.5-16k, longchat-13b-16k, CodeLlama-13b-hf, CodeLlama-34b-hf has larger context length, which causes the memory needed to store activation values to be high. FSDP doesn't shard it, so we can get OOM errors irrespectiva of number GPUs used.

Pitch

Changes include:

New argument in init function Benchmark_litGPT (checkpoint_activations)
New method setup_activation, which will be used depending on value of checkpoint_activations

Alternatives

Another alternative could be tensor parallelism or sequence parallelism.

cc @crcrpar

Lightning-AI / lightning-thunder