ROCm / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
219 stars 50 forks source link

Fix SWDEV-459621. #1427

Closed xinyazhang closed 1 month ago

xinyazhang commented 1 month ago

The UT does not skip efficient attention operator on platforms that do not support efficient attention. Support of flash attention does not always ensure efficient attention is implemented.

Fixes #SWDEV-459621

xinyazhang commented 1 month ago

Tested on rocm-framework-51

(py_3.10) xinyazha@12fd1640b6b7:~/rocm-pytorch$ PYTORCH_TEST_WITH_ROCM=1 python test/distributed/_tensor/test_attention.py -k test_ring_attention_compile_attention_fn1 -v
test_ring_attention_compile_attention_fn1 (__main__.RingAttentionTest) ... [rank1]:[W530 08:06:08.493322199 ProcessGroupNCCL.cpp:1113] WARNING: process group has NOT been destroyed before it is being destructed. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL data transfers have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4
[rank0]:[W530 08:06:08.539286216 ProcessGroupNCCL.cpp:1113] WARNING: process group has NOT been destroyed before it is being destructed. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL data transfers have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4
skipped 'Test skipped at subprocess level, look at subprocess log for skip reason'

----------------------------------------------------------------------
Ran 1 test in 4.512s

OK (skipped=1)