Like One-sided Ops, we need to use allreduce ops as barrier before GPU allgahter related Ops in test.
Other, a segmentation fault will pop out.
However, CPU ops do not need that. Also if all ops is set at "cuda:0", this barrier is not needed either. So question, why it is necessary in multiple GPU scenarios?
Like One-sided Ops, we need to use allreduce ops as barrier before GPU allgahter related Ops in test. Other, a segmentation fault will pop out.
However, CPU ops do not need that. Also if all ops is set at "cuda:0", this barrier is not needed either. So question, why it is necessary in multiple GPU scenarios?