Closed athitten closed 2 years ago
jenkins: retest this please
jenkins: retest this please
The failing unit test is from distributed unit tests. The command is $ python -m torch.distributed.launch --nproc_per_node=2 amp_master_params.py $ python compare.py The error is not introduced by the changes of this PR.
The test failed in apex-rocm-pytorch-master (with PyTorch built from the tip of ROCm master branch (4a1785)) but passed in apex-rocm-pytorch-release (rocm/pytorch:latest = rocm/pytorch:rocm5.0.1_ubuntu18.04_py3.7_pytorch_staging).
Updated all the files in MHA with rocblas_alt_impl flag in bwd rocblas calls. Checked all the unit tests and all of them passed.