Open panpanli521 opened 2 years ago
I benchmarked the performance of BytePS with cross barrier using the script in /example/pytorch/benchmark_cross_barrier_byteps.py.
The complete commands as follows:
export DMLC_NUM_WORKER=2 export DMLC_ROLE=scheduler export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip1 bpslaunch
sever1: export DMLC_NUM_WORKER=2 export DMLC_ROLE=server export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip1 bpslaunch
export DMLC_NUM_WORKER=2 export DMLC_ROLE=server export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip1 bpslaunch
sever2: export DMLC_NUM_WORKER=2 export DMLC_ROLE=server export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip2 bpslaunch
export DMLC_NUM_WORKER=2 export DMLC_ROLE=server export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip2 bpslaunch
worker1 export NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export DMLC_WORKER_ID=0 export DMLC_NUM_WORKER=2 export DMLC_ROLE=worker export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 # the scheduler port export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip3 bpslaunch python3 /usr/local/byteps/example/pytorch/benchmark_cross_barrier_byteps.py --model resnet50 --batch-size 64 --num-iters 500
export NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export DMLC_WORKER_ID=0 export DMLC_NUM_WORKER=2 export DMLC_ROLE=worker export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 # the scheduler port export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip3 bpslaunch python3 /usr/local/byteps/example/pytorch/benchmark_cross_barrier_byteps.py --model resnet50 --batch-size 64 --num-iters 500
worker2 export NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export DMLC_WORKER_ID=1 export DMLC_NUM_WORKER=2 export DMLC_ROLE=worker export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip4 bpslaunch python3 /usr/local/byteps/example/pytorch/benchmark_cross_barrier_byteps.py --model resnet50 --batch-size 64 --num-iters 500
export NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export DMLC_WORKER_ID=1 export DMLC_NUM_WORKER=2 export DMLC_ROLE=worker export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip4 bpslaunch python3 /usr/local/byteps/example/pytorch/benchmark_cross_barrier_byteps.py --model resnet50 --batch-size 64 --num-iters 500
After executing the command, worker1 can print throughout but worker2 is hanging:
Finished:
I benchmarked the performance of BytePS with cross barrier using the script in /example/pytorch/benchmark_cross_barrier_byteps.py.
The complete commands as follows:
export DMLC_NUM_WORKER=2 export DMLC_ROLE=scheduler export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip1 bpslaunch
sever1:
export DMLC_NUM_WORKER=2 export DMLC_ROLE=server export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip1 bpslaunch
sever2:
export DMLC_NUM_WORKER=2 export DMLC_ROLE=server export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip2 bpslaunch
worker1
export NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export DMLC_WORKER_ID=0 export DMLC_NUM_WORKER=2 export DMLC_ROLE=worker export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 # the scheduler port export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip3 bpslaunch python3 /usr/local/byteps/example/pytorch/benchmark_cross_barrier_byteps.py --model resnet50 --batch-size 64 --num-iters 500
worker2
export NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export DMLC_WORKER_ID=1 export DMLC_NUM_WORKER=2 export DMLC_ROLE=worker export DMLC_NUM_SERVER=2 export DMLC_PS_ROOT_URI=ip1 export DMLC_PS_ROOT_PORT=1234 export DMLC_INTERFACE=xgbe1 export DMLC_NODE_HOST=ip4 bpslaunch python3 /usr/local/byteps/example/pytorch/benchmark_cross_barrier_byteps.py --model resnet50 --batch-size 64 --num-iters 500
After executing the command, worker1 can print throughout but worker2 is hanging:
Finished: