Closed pragupta closed 3 months ago
I see this one failing consistently failing due to tensor-likes are not equal:
python test/distributed/test_c10d_gloo.py -k test_reduce_stress_cuda
These are flaky. I see them run fine when I run individually, however, they timeout in the bunch:
test/distributed/fsdp/test_fsdp_clip_grad_norm.py -k test_no_gradients
test/distributed/fsdp/test_fsdp_optim_state.py -k test_optim_state_dict_nested
test/distributed/fsdp/test_fsdp_optim_state.py -k test_scatter_full_optim_state_dict
test/distributed/fsdp/test_fsdp_optim_state.py -k test_rekey_optim_state_dict
test/distributed/fsdp/test_fsdp_optim_state.py -k test_shard_full_optim_state_dict
test/distributed/fsdp/test_fsdp_optim_state.py -k test_full_optim_state
test/distributed/fsdp/test_fsdp_use_orig_params.py -k test_access_params_after_forward
I see this one failing consistently failing due to tensor-likes are not equal:
python test/distributed/test_c10d_gloo.py -k test_reduce_stress_cuda
These are flaky. I see them run fine when I run individually, however, they timeout in the bunch:
test/distributed/fsdp/test_fsdp_clip_grad_norm.py -k test_no_gradients
test/distributed/fsdp/test_fsdp_optim_state.py -k test_optim_state_dict_nested
test/distributed/fsdp/test_fsdp_optim_state.py -k test_scatter_full_optim_state_dict
test/distributed/fsdp/test_fsdp_optim_state.py -k test_rekey_optim_state_dict
test/distributed/fsdp/test_fsdp_optim_state.py -k test_shard_full_optim_state_dict
test/distributed/fsdp/test_fsdp_optim_state.py -k test_full_optim_state
test/distributed/fsdp/test_fsdp_use_orig_params.py -k test_access_params_after_forward