Benchmark for 1 node with 4 GPUs

How to run flexgen OPT-6.7B for 1 node with 4 GPUs script in 3090?

I have already install openmpi-bin ,but when I was running bash bench_6.7b_1x4.sh,the program got stuck like that:

+ mpirun --mca btl_tcp_if_exclude lo,docker0 --mca oob_tcp_if_exclude lo,docker0 --map-by ppr:4:node:pe=6 --oversubscribe -H 127.0.1.1 --bind-to core -x OMP_NUM_THREADS=6 /home/qlchen/miniconda3/bin/python -m flexgen.dist_flex_opt --head-ip 127.0.1.1 --port 7777 --use-mpi --model facebook/opt-6.7b --gpu-batch-size 24 --percent 100 0 100 0 100 0 --comm-device cpu --cut-gen-len 5 --path _DUMMY_
No protocol specified

I would be very grateful for any help you can give me!

FMInference / FlexLLMGen

Benchmark for 1 node with 4 GPUs #106