Open hanbinhu opened 3 years ago
The ready event in neighbor_allreduce dst_weight makes sure the data_weight computation is done before communication, as Pytorch CUDA stream is not synchronized with our CUDA stream.
The ready event in neighbor_allreduce dst_weight makes sure the data_weight computation is done before communication, as Pytorch CUDA stream is not synchronized with our CUDA stream.