Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.17k stars 77 forks source link

Add activation checkpointing to litGPT benchmarking script #375

Closed mpatel31415 closed 5 months ago

mpatel31415 commented 5 months ago

🚀 Feature

Add activation checkpointing to benchmark_litgpt script.

Motivation

LitGPT models like: Mistral-7B-v0.2, vicuna-13b-v1.5-16k, longchat-13b-16k, CodeLlama-13b-hf, CodeLlama-34b-hf has larger context length, which causes the memory needed to store activation values to be high. FSDP doesn't shard it, so we can get OOM errors irrespectiva of number GPUs used.

Pitch

Changes include:

Alternatives

Another alternative could be tensor parallelism or sequence parallelism.

cc @crcrpar

IvanYashchuk commented 5 months ago

Thunder should support activation checkpointing, activation offloading, and sequence parallelism to enable long context models.

Modifying the benchmark script to measure PyTorch performance on these models is a good first step to keep reminding ourselves about the need for this feature.