vjp correctness fails for sdpa manual grad forward sdpa

From the CI run in an intermediate version of #691 :

FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_nvfuser_cuda_thunder.dtypes.float16 - AssertionError: Tensor-likes are not close!

Mismatched elements: 19624 / 245760 (8.0%) Greatest absolute difference: 0.125 at index (1, 1, 60, 64) (up to 1e-05 allowed) Greatest relative difference: inf at index (0, 0, 54, 6) (up to 0.001 allowed) FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_nvfuser_cuda_thunder.dtypes.bfloat16 - AssertionError: Tensor-likes are not close!

Mismatched elements: 7563 / 180224 (4.2%) Greatest absolute difference: 1.046875 at index (6, 0, 86, 73) (up to 1e-05 allowed) Greatest relative difference: inf at index (0, 1, 22, 61) (up to 0.016 allowed) = 2 failed, 4667 passed, 823 skipped, 108 xfailed, 96 xpassed, 119791 warnings in 594.94s (0:09:54) =

@vedaanta I tentatively assigned it to you because #691 is your PR...

Lightning-AI / lightning-thunder

vjp correctness fails for sdpa manual grad forward sdpa #703