Poor performance of multigpu environment

I have a PC with one GTX 1080 Ti GPU and recently got another system in my lab with two RTX 2080 Ti GPUs. I wanted to test and compare these two systems and therefore used your multigpu_cnn.py example, but the results were very strange as the performance of the new two GPU machine was much lower than the older one! Could you please look at this issue and guide me if possible.

The first trial configuration was as follows:

PC with 1 GTX 1080 Ti is a Windows 10 system configured with Tensorflow 1.12.2, Cuda Toolkit 9.0.176, CuDNN 7.3.0, and Nvidia display driver 385.54. Python version is 3.6.5.
PC with 2 RTX 2080 Ti is a Unix CentOS 7 system configured with Tensorflow 1.12.2, Cuda Toolkit 9.0.176, CuDNN 7.3.0, Nvidia display driver 430.34, and Python version is 3.6.3.
(PC with 2 GPU had CentOS and nvidia graphic driver already installed and I could not change them)

With this configuration, on GTX 1080 (by setting numgpu=1) running speed was about 13,000 samples/sec. On RTX 2080, in case of setting numgpu=1 running speed was about 1300 samples/sec and with numgpu=2 running speed was just about 1100 samples/sec!

In second try I upgraded the configuration of RTX 2080 system to Tensorflow 1.14.0, Cuda Toolkit 10.0.130, and CuDNN 7.6.3. Nvidia display driver remained at 430.34, and I used the native CentOS Python with version is 2.7.5. The running speed was the same.

I think there is clearly a problem with the new PC and the above figures are not reasonable. Any clue?

aymericdamien / TensorFlow-Examples

Poor performance of multigpu environment #323