Open mpatel31415 opened 1 day ago
The difference seems to be due to regression in the batch size used (2 -> 1). This could be related to the switch to the "block" bucketing mode instead of "none". There was an increase in memory usage for other models resulting in OOM and here it seems it resulted in a smaller batch size that works. @kiya00, could you please take a look at this regression and find out what has caused this?
🐛 Bug
Here are recently found regressions:
To Reproduce
All parameters to benchmark_litgpt.py are visible in the attached image.
Environment
system.device_product_name DGXH100 system.gpu_driver_version 535.129.03 libraries.cuda 12.6.98.001 libraries.pip.lightning 2.4.0.dev20240728 libraries.pip.lightning-thunder 0.2.0.dev0 libraries.pip.lightning-utilities 0.11.8 libraries.pip.litgpt 0.4.11 libraries.pip.nvfuser 0.2.22+gitba4f7d4 libraries.pip.pytorch-lightning 2.4.0 libraries.pip.torch 2.6.0a0+gita9b4989 libraries.pip.torchao 0.6.1 libraries.pip.torchmetrics 1.5.1 libraries.pip.torchvision 0.19.0a0+d23a6e1