Bluefog-Lib / bluefog

Distributed and decentralized training framework for PyTorch over graph
https://bluefog-lib.github.io/bluefog/
Apache License 2.0
291 stars 71 forks source link

Timeline related issues #10

Closed BichengYing closed 4 years ago

BichengYing commented 4 years ago
  1. Finer slicing on the time such as memory allocation time and local average computation time.
  2. Makefile and more unit test. (Put)
  3. Compare the GPU tensor communicating with GPU-aware MPI and CPU MPI. In bluefog, we have two env variables: BLUEFOG_WIN_ON_CPU=1 and BLUEFOG_OPS_ON_CPU=1.
  4. Async updating on multiple processes into one file.

Overall goal: to think about how to illustrate the excellent performance compared with other libraries.

Bonus: Merge timeline of the Python gradient computation time with C++ communication.

kunyuan827 commented 4 years ago

2: added unittest for win_opt and makefile edited

BichengYing commented 4 years ago

1, 3 and bonus point has been finished. The only unclear thing is how to display or merge multiple processes' files into one

BichengYing commented 4 years ago

@kunyuan827 Update timeline docs and more convex and non-convex examples