Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.07k stars 60 forks source link

Bumps cudnn FE to v1.5.2 #651

Closed vedaanta closed 2 days ago

vedaanta commented 3 days ago
## What does this PR do? Updates cudnn-fe to 1.5.2 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding 🙃
Borda commented 3 days ago

running in https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=206791&view=results

t-vi commented 2 days ago

@vedaanta Do you know what to do about the failures in the nightly check: https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=206808&view=logs&j=5b0799f7-725e-5b16-9b83-c0a5a25d03f0&t=97651ec4-0b0f-5455-bbb5-3c30427a0a7e to me, they look somewhat similar to the ones I disabled cudnn tests on 2.4dev until last week.

vedaanta commented 2 days ago

@vedaanta Do you know what to do about the failures in the nightly check: https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=206808&view=logs&j=5b0799f7-725e-5b16-9b83-c0a5a25d03f0&t=97651ec4-0b0f-5455-bbb5-3c30427a0a7e to me, they look somewhat similar to the ones I disabled cudnn tests on 2.4dev until last week.

Yes, the commit that caused the earlier sdpa failure on pyt main has been merged again. Thunder CI breaks with 2.5.0a0+git92be340. LINK It worked fine with 2.5.0a0+gitb1f486a. LINK

The suspicious commit between the them is the one that enable cudnn sdpa by default on pyt main. LINK

Will take a look and let pyt devs know.

t-vi commented 2 days ago

Thank you @vedaanta @Borda