Closed BichengYing closed 4 years ago
Maybe we can try thread pool?
Threading pool method is not feasible since the win_ops conflicted with multi-thread.
The only approach is like horovod, use rank 0 as the master to coordinate with others
Can be solved through adding negotiation stage
Our implementation required the global collective ops between different processes are always the same order. If the order is the same, for example, MPI_allreduce for layer 1 weight at rank 0 and MPI_allreduce for layer 1 bais, the MPI will either abort or hang.