bytedance / byteps

A high performance and generic framework for distributed DNN training
Other
3.63k stars 488 forks source link

How does the tensorflow scheduler plugin used in the tf_benchmark_cnn.py #442

Open sxqqslf opened 1 year ago

sxqqslf commented 1 year ago

Hi, I'm testing the bytescheduler in tensorflow, follow the tutorial, compile the libplugin.so, and load it in the benchmark script. Now, I load the library in another script (my own), but i found that the training fps is almost same, so i wonder where does the scheduler algorithm use in code, it's explicit or explicit?Thanks.

pengyanghua commented 1 year ago

If the plugin does not run, check https://github.com/bytedance/byteps/tree/bytescheduler/bytescheduler#tensorflow . If the plugin runs successfully but with no distributed training speedup, you can profile the trace and see if how much is potential improvement space and whether communication scheduling can help.