Closed samiwilf closed 1 year ago
Example run: $ mpirun -x FI_PROVIDER="efa" -x FI_EFA_USE_DEVICE_RDMA=1 -x NCCL_DEBUG=INFO --hostfile ~/hosts -np 32 -N 8 --mca pml ^cm --mca btl tcp,self --bind-to none ./run_dlrm_ubench_train_alltoall.sh -l results -c 544000000 $(hostname -i)
(This was tested on 4 ec2 p4d.24xlarge instances running Ubuntu 20.04.)
Example run: $ mpirun -x FI_PROVIDER="efa" -x FI_EFA_USE_DEVICE_RDMA=1 -x NCCL_DEBUG=INFO --hostfile ~/hosts -np 32 -N 8 --mca pml ^cm --mca btl tcp,self --bind-to none ./run_dlrm_ubench_train_alltoall.sh -l results -c 544000000 $(hostname -i)
(This was tested on 4 ec2 p4d.24xlarge instances running Ubuntu 20.04.)