bytedance / byteps

A high performance and generic framework for distributed DNN training
Other
3.63k stars 488 forks source link

benchmark with cross barrier error #427

Open panpanli521 opened 2 years ago

panpanli521 commented 2 years ago

I benchmarked the performance of BytePS with cross barrier using the script in /example/pytorch/benchmark_cross_barrier_byteps.py.

The complete commands as follows:

export DMLC_NUM_WORKER=2 export DMLC_ROLE=scheduler export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip1 bpslaunch

After executing the command, worker1 can print throughout but worker2 is hanging: image

Finished: image