facebookresearch / FAMBench

Benchmarks to capture important workloads.
Apache License 2.0
28 stars 23 forks source link

Add multi-node compatibility to dlrm comms ubenches. #82

Closed samiwilf closed 1 year ago

samiwilf commented 2 years ago

Example run: $ mpirun -x FI_PROVIDER="efa" -x FI_EFA_USE_DEVICE_RDMA=1 -x NCCL_DEBUG=INFO --hostfile ~/hosts -np 32 -N 8 --mca pml ^cm --mca btl tcp,self --bind-to none ./run_dlrm_ubench_train_alltoall.sh -l results -c 544000000 $(hostname -i)

(This was tested on 4 ec2 p4d.24xlarge instances running Ubuntu 20.04.)