The failing unit tests introduced by the new PyTorch commits related to "if cached_x.grad_fn.next_functions[1][0].variable is not x: IndexError: tuple index out of range" also observed on CUDA (upstream). We will first skip the following unit tests and re-enable them later.
Additionally, the failing test for test_half (test_fused_optimizer.TestFusedAdam) is only observed on ROCm. There are some NaNs "sporadically" (99% values are correct compared to the outputs with torch.optim.Adam) showing in the outputs after apex.optimizers.FusedAdam is called to update its parameters. We are currently looking into the issue.
The failing unit tests introduced by the new PyTorch commits related to "if cached_x.grad_fn.next_functions[1][0].variable is not x: IndexError: tuple index out of range" also observed on CUDA (upstream). We will first skip the following unit tests and re-enable them later.
Additionally, the failing test for test_half (test_fused_optimizer.TestFusedAdam) is only observed on ROCm. There are some NaNs "sporadically" (99% values are correct compared to the outputs with torch.optim.Adam) showing in the outputs after apex.optimizers.FusedAdam is called to update its parameters. We are currently looking into the issue.