ROCm / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
17 stars 14 forks source link

Fix a bug in fused_dense_cuda on ROCm #99

Closed hubertlu-tw closed 1 year ago

hubertlu-tw commented 1 year ago

With Kk's help, we found a bug in fused_dense_cuda on ROCm.
It failed the unit test of fused_dense on ROCm before this fix.

Steps to reproduce: $ cd apex/contrib/test/fused_dense $ pytest test_fused_dense.py