Lightning.ai • Performance • Get started • Install • Examples • Inside Thunder • Get involved! • Documentation
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/lightning-thunder/blob/main/LICENSE) [![CI testing](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml) [![General checks](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml) [![Documentation Status](https://readthedocs.org/projects/lightning-thunder/badge/?version=latest)](https://lightning-thunder.readthedocs.io/en/latest/?badge=latest) [![pre-commit.ci status](https://results.pre-commit.ci/badge/github/Lightning-AI/lightning-thunder/main.svg)](https://results.pre-commit.ci/latest/github/Lightning-AI/lightning-thunder/main)Thunder makes PyTorch models Lightning fast.
Thunder is a source-to-source compiler for PyTorch. It makes PyTorch programs faster by combining and using different hardware executors at once (for instance, nvFuser, torch.compile, cuDNN, and TransformerEngine FP8).
It supports both single and multi-GPU configurations. Thunder aims to be usable, understandable, and extensible.
[!Note] Lightning Thunder is in alpha. Feel free to get involved, but expect a few bumps along the way.
Thunder can achieve significant speedups over standard non-compiled PyTorch code ("PyTorch eager"), through the compounding effects of optimizations and the use of best-in-class executors. The figure below shows the pretraining throughput for Llama 2 7B as implemented in LitGPT.
As shown in the plot above, Thunder achieves a 40% speedup in training throughput compared to eager code on H100 using a combination of executors including nvFuser, torch.compile, cuDNN, and TransformerEngine FP8.
Thunder also supports distributed strategies such as DDP and FSDP for training models on multiple GPUs. The following plot displays the normalized throughput measured for Llama 2 7B without FP8 mixed precision; support for FSDP is in progress.
The easiest way to get started with Thunder, requiring no extra installations or setups, is by using our Zero to Thunder Tutorial Studio.
To use Thunder on your local machine:
# install nvFuser which installs the matching nightly PyTorch
pip install --pre 'nvfuser-cu121[torch]' --extra-index-url https://pypi.nvidia.com
# install cudnn
pip install nvidia-cudnn-frontend
# install thunder
pip install lightning-thunder
Below is a simple example of how Thunder allows you to compile and run PyTorch code:
import torch
import thunder
def foo(a, b):
return a + b
jfoo = thunder.jit(foo)
a = torch.full((2, 2), 1)
b = torch.full((2, 2), 3)
result = jfoo(a, b)
print(result)
# prints
# tensor(
# [[4, 4]
# [4, 4]])
The compiled function jfoo
takes and returns PyTorch tensors, just like the original function, so modules and functions compiled by Thunder can be used as part of larger PyTorch programs.
Thunder is in its early stages and should not be used for production runs yet.
However, it can already deliver outstanding performance for pretraining and finetuning LLMs supported by LitGPT, such as Mistral, Llama 2, Gemma, Falcon, and others.
Check out the LitGPT integration to learn about running LitGPT and Thunder together.
Given a Python callable or PyTorch module, Thunder can generate an optimized program that:
To do so, Thunder ships with:
grad
, fusions, distributed (like ddp
, fsdp
), functional (like vmap
, vjp
, jvp
)Thunder is written entirely in Python. Even its trace is represented as valid Python at all stages of transformation. This allows unprecedented levels of introspection and extensibility.
Thunder doesn't generate code for accelerators, such as GPUs, directly. It acquires and transforms user programs so that it's possible to optimally select or generate device code using fast executors like:
Modules and functions compiled with Thunder fully interoperate with vanilla PyTorch and support PyTorch's autograd. Also, Thunder works alongside torch.compile to leverage its state-of-the-art optimizations.
Online documentation is available. To build documentation locally you can use
make docs
and point your browser to the generated docs at docs/build/index.html
.
We appreciate your feedback and contributions. If you have feature requests, questions, or want to contribute code or config files, please don't hesitate to use the GitHub Issue tracker.
We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.
Lightning Thunder is released under the Apache 2.0 license. See the LICENSE file for details.