Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.18k
stars
77
forks
source link
Thunder and ThunderFX are slower than torch.compile for FP8 and falcon-7b and other models #1365
🐛 Bug
As can be seen below Thunder is slower than torch.compile for single gpu training of falcon-7b:
Below are results for ThunderFX for multi-gpu training :
Batch sizes and sharding modes doesn't match, but these are the fastest options for ThunderFX:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Thunder should be as fast as torch.compile.
Environment
system.device_product_name DGXH100 system.gpu_driver_version 535.129.03 libraries.cuda 12.6.2.004 libraries.pip.lightning 2.4.0.dev20240728 libraries.pip.lightning-thunder 0.2.0.dev0 libraries.pip.lightning-utilities 0.11.8 libraries.pip.litgpt 0.4.11 libraries.pip.nvfuser 0.2.20+git85c22a2 libraries.pip.pytorch-lightning 2.4.0 libraries.pip.torch 2.6.0a0+git96b30dc libraries.pip.torchmetrics 1.5.1 libraries.pip.torchvision 0.19.0a0+d23a6e1