Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.19k stars 80 forks source link

`test_autocast_matmul`: is flaky #850

Closed nikitaved closed 3 months ago

nikitaved commented 3 months ago

As per title.

pytest --show-progress --verbose thunder/tests/test_autocast.py -k test_autocast_torch_matmul --count 300 -n 30
...
========================================================================================================== short test summary info ===========================================================================================================
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype0-cpu-True-104-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype0-cpu-True-64-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype0-cpu-True-140-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype0-cuda-True-71-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cpu-True-6-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cpu-True-7-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cpu-True-169-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cpu-True-157-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cpu-True-37-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cpu-True-100-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-82-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-65-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-126-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-197-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-264-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-256-300] - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_autocast.py::test_autocast_torch_matmul[b_dtype1-cuda-True-231-300] - AssertionError: Tensor-likes are not close!
============================================================================================== 17 failed, 2383 passed, 12241 warnings in 34.75s ==============================================================================================

cc @borda, @kshitij12345

nikitaved commented 3 months ago

OK, the fix should be simple. On it