ROCm / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
219 stars 51 forks source link

Mitigates SWDEV-459618 #1430

Closed xinyazhang closed 1 month ago

xinyazhang commented 4 months ago

_c10d_functional_autograd::all_to_all_single seems not implemented on ROCm.

Note: another unfixed problem is the mismatching of outputs between torch.ops.aten._scaled_dot_product_flash_attention and _scaled_dot_product_chunk_flash_attention. We need fix both problems to enable this UT.

Fixes SWDEV-459618

pruthvistony commented 1 month ago

I believe the cherry-pick into rocm6.3_internal_testing is required