Open dajiji opened 3 years ago
I also want to ask the same problem:
>>> import inspect
>>> inspect.getfullargspec(torch.distributed.reduce_scatter).args
['output', 'input_list', 'op', 'group', 'async_op']
torch.distributed.reduce_scatter should be no 'no_copy' parameter, code is follow: https://github.com/pytorch/pytorch/blob/v1.5.0/torch/distributed/distributed_c10d.py#L1425
Hi APEX, Can you please suggest how to work around the failed "c10d no_copy" assertion in https://github.com/NVIDIA/apex/blob/master/apex/contrib/optimizers/distributed_fused_lamb.py#L140?
Perhaps it deprecated after NVIDIA's run in this June. Which pip wheel in https://pytorch.org/ shall I use? Or how can I get a PyTorch with c10d no_copy support?
Here's how to reproduce it: Run NVIDIA MLPerf PyTorch BERT code with python39+cuda11.1+pytorch1.9/1.8/1.10nightly+apex master All failed.
Command:
Appreciate any suggestions.