Is speed up using multiple GPUs on qsim-mgpu available?

Hi, I'm using NVIDIA docker container 23.03 cuquantum appliance and trying to see the multi-gpu speed up. I used frontend with cirq and backend with qsim-mgpu and tested pretty much all of the benchmark provided in the github with qubits ranging 10 to 30+. However, I see performance degradation with multiple GPUs compared to using only single GPU.

I assume this is based on data communication time among GPUs, but I would like to see the performance improvement as stated in NVIDIA cuStatevec blog.

Can anyone help?

NVIDIA / cuQuantum

Is speed up using multiple GPUs on qsim-mgpu available? #48