Closed frabu6 closed 1 year ago
Summary: Current code assumed training runs only on one node, and there is always a global rank0 on each node. This assumption fails on multinode training, resulting in a key 0 error.
Reviewed By: crassirostris
Differential Revision: D46841286
This pull request was exported from Phabricator. Differential Revision: D46841286
This pull request has been merged in facebookresearch/d2go@783288394b9ac27b63cc816f751c2f4d6efe8fdc.
Summary: Current code assumed training runs only on one node, and there is always a global rank0 on each node. This assumption fails on multinode training, resulting in a key 0 error.
Reviewed By: crassirostris
Differential Revision: D46841286