Bluefog-Lib / bluefog

Distributed and decentralized training framework for PyTorch over graph
https://bluefog-lib.github.io/bluefog/
Apache License 2.0
291 stars 71 forks source link

Allow to run Win_Ops with GPU tensor through cpu #6

Closed BichengYing closed 4 years ago

BichengYing commented 4 years ago

Since there are lots of issues about running the win_ops on the GPU-aware mpi, it is necessary to allow to run win_ops through the CPU communication

BichengYing commented 4 years ago

Done. Introduced two env variables: BLUEFOG_WIN_ON_CPU=1 BLUEFOG_OPS_ON_CPU=1 if those are set, all win ops and other ops will transfer the torch cuda into cpu, then communicate cpu vector through mpi, last transform back to cuda.