Lc0 cudnn does not parallelize properly across multiple GPUs

I am trying to run Alexander's lc0 cudnn on 3 GPUs (1 Titan V + 2 x 1080). I use the command lc0.exe -w weights.txt --no-smart-pruning --backend=multiplexing "--backend-opts=x(backend=cudnn,gpu=0,max_batch=512),y(backend=cudnn,gpu=1,max_batch=256),z(backend=cudnn,gpu=2,max_batch=256)" --threads=4 and then type go nodes 130000 to do a benchmark. However, using more GPUs than just the Titan V does not help the NPS get any higher. In fact, NPS gets slightly lower. Also, the utilization of each GPU is 30%, whereas if I run lc0 on one GPU alone, its utilization is 90%.

Why doesn't lc0 properly parallelize / fully utilize the multiple GPUs?

glinscott / leela-chess

Lc0 cudnn does not parallelize properly across multiple GPUs #687