Bluefog-Lib / bluefog

Distributed and decentralized training framework for PyTorch over graph
https://bluefog-lib.github.io/bluefog/
Apache License 2.0
291 stars 71 forks source link

Mac + OpenMPI 4.0.5 Failed on Window test #42

Open Bluefog-Lib opened 4 years ago

Bluefog-Lib commented 4 years ago

It might be related with win_put and win_accumualte transmitting on double tensors with 232323 as dimension

Bluefog-Lib commented 4 years ago

It is more likely due to deadlock. It can be alleviated with setting the largest sending win_length to be 1000. Probably, shared win memory can fix this problem entirely??