Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.17k stars 77 forks source link

reenable testing cudnn SDPA with PyTorch dev version / 2.4.0a0+ #567

Closed t-vi closed 4 months ago

t-vi commented 4 months ago

I have disabled a number of tests failing with 2.4.0a0+. I imagine it is a PyTorch thing, but I'm a bit concerned nonetheless.

thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_torch_cuda_bfloat16 and one FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_float16 - RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans built successfully..

While investigating the latter, on my local this seems flaky and prone to hang in the second of the three (float16, bfloat16, float32) tests.


The CI job for CUDA PyTorch main branch fails with CUDNN errors since today-ish (European time). Seems to fail 100%, not randomly.

e.g. https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=205024&view=logs&j=5b0799f7-725e-5b16-9b83-c0a5a25d03f0&t=97651ec4-0b0f-5455-bbb5-3c30427a0a7e&l=11885

I don't have an idea yet what happened.

FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_nvfuser_cuda_float16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_nvfuser_cuda_float16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_cudnn_executor.py::test_cudnn_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_bfloat16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_nvfuser_cuda_bfloat16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_bfloat16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_torch_cuda_bfloat16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_cudnn_executor.py::test_cudnn_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_float16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_nvfuser_cuda_bfloat16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_float16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_torch_cuda_float16 - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_cudnn_executor.py::test_vjp_correctness_cudnn_sdpa[bfloat16-may-cat-grad-qkv] - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_cudnn_executor.py::test_vjp_correctness_cudnn_sdpa[float16-may-cat-grad-qkv] - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_cudnn_executor.py::test_vjp_correctness_cudnn_sdpa[bfloat16-never-cat-grad-qkv] - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM
FAILED thunder/tests/test_cudnn_executor.py::test_vjp_correctness_cudnn_sdpa[float16-never-cat-grad-qkv] - RuntimeError: cuDNN Frontend error: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM

cc @borda

t-vi commented 4 months ago

This is upstream at https://github.com/pytorch/pytorch/pull/128350

t-vi commented 4 months ago

With the merged PyTorch PR, we traded the old errors for a number of vjp correctness errors about different results in thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_torch_cuda_bfloat16 and one FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_float16 - RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans built successfully..

While investigating the latter, on my local this seems flaky and prone to hang in the second of the three (float16, bfloat16, float32) tests.

vedaanta commented 4 months ago

@t-vi cudnn spda is now disabled on pyt main altogether. Moreover, pytorch main has moved to 2.5 and this thunder CI is now running cudnn sdpa tests successfully today. (Link to thunder CI logs) (Link to revert commit)

I think we can reenable sdpa tests for 2.4 too, in case those still affect CI. Or atleast remove those pytest.skip_if macros.

a number of vjp correctness errors about different results in thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_torch_cuda_bfloat16 and one FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_grad_forward_scaled_dot_product_attention_torch_cuda_float16 - RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans built successfully..

Can you please provide details about your environment? Mainly the:

import cudnn
print(cudnn.__version__)
print(cudnn.backend_version_string())

The RuntimeError: cuDNN Frontend error: makes me think that this is an old version of thunder/cudnn-frontend. Currently, cudnnex is supposed to reject versions before 1.3 for cudnn frontend.