Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.2k stars 80 forks source link

ThunderFX is significantly slower than 2 weeks ago for 3 models #1428

Open mpatel31415 opened 1 day ago

mpatel31415 commented 1 day ago

🐛 Bug

Here are recently found regressions:

image

To Reproduce

All parameters to benchmark_litgpt.py are visible in the attached image.

Environment

system.device_product_name DGXH100 system.gpu_driver_version 535.129.03 libraries.cuda 12.6.98.001 libraries.pip.lightning 2.4.0.dev20240728 libraries.pip.lightning-thunder 0.2.0.dev0 libraries.pip.lightning-utilities 0.11.8 libraries.pip.litgpt 0.4.11 libraries.pip.nvfuser 0.2.22+gitba4f7d4 libraries.pip.pytorch-lightning 2.4.0 libraries.pip.torch 2.6.0a0+gita9b4989 libraries.pip.torchao 0.6.1 libraries.pip.torchmetrics 1.5.1 libraries.pip.torchvision 0.19.0a0+d23a6e1

IvanYashchuk commented 1 day ago

The difference seems to be due to regression in the batch size used (2 -> 1). This could be related to the switch to the "block" bucketing mode instead of "none". There was an increase in memory usage for other models resulting in OOM and here it seems it resulted in a smaller batch size that works. @kiya00, could you please take a look at this regression and find out what has caused this?