NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.39k forks source link

[Transformer] Update p2p communication routine #1650

Closed Aidyn-A closed 1 year ago

Aidyn-A commented 1 year ago

There was a change in coalescing manager that affects batch_isend_irecv https://github.com/pytorch/pytorch/pull/98793. With this change there will be only one handle (reqs is now a single element list) so _run_p2pops must be adjusted accordingly.

cc @crcrpar @ptrblck