Isolation fails on a GTX-690 running a X server

NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

Apache License 2.0

17.25k stars 2.03k forks source link

Isolation fails on a GTX-690 running a X server #523

Closed cyberwillis closed 5 years ago

cyberwillis commented 7 years ago

Hi guys,

I was building my nvidia/cuda image from the source and after completion successfully I got one strange error selecting the device 1 see bellow the results.

BTW: I am switching from Nvidia-docker 1.0 to Nvidia-docker 2.0

OK

$ docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all \
             --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 nvidia-smi

Tue Nov  7 20:34:29 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 32%   44C    P8    N/A /  N/A |    690MiB /  2047MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 30%   40C    P8    N/A /  N/A |    690MiB /  2047MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

OK

$ docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 \ 
             --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 nvidia-smi

Tue Nov  7 20:34:39 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 32%   44C    P8    N/A /  N/A |    690MiB /  2047MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Unknown Error

$ docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1 \ 
             --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 nvidia-smi

Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error

flx42 commented 7 years ago

In the past we had issues with GTX 690: #206 We are not using the same function calls in 2.0, but maybe it's related.

Can you copy the output of nvidia-smi -q?

cyberwillis commented 7 years ago

Hi, sure! Appear to be the same !

Just to let you know, I am running Ubuntu 16.04 LTS. On this machine I just have Cuda 9.0 and cuDnn 7, but on the container I had installed Cuda 8.0 and cuDnn 6.

$ uname -r 
4.11.0-14-lowlatency

$ dmesg | grep -i nvidia
[    1.175191] nvidia: loading out-of-tree module taints kernel.
[    1.175298] nvidia: module license 'NVIDIA' taints kernel.
[    1.187647] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    1.192626] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
[    1.193015] nvidia 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    1.193350] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.90  Tue Sep 19 19:17:35 PDT 2017 (using threaded interrupts)
[    1.195440] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.90  Tue Sep 19 17:05:19 PDT 2017
[    1.196231] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[    1.196330] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:04:00.0 on minor 0
[    1.196544] [drm] [nvidia-drm] [GPU ID 0x00000500] Loading driver
[    1.196648] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:05:00.0 on minor 1
[    4.774623] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 243
[    5.082426] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:02:00.0/0000:03:08.0/0000:04:00.1/sound/card1/input23
[    5.082523] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.0/0000:02:00.0/0000:03:08.0/0000:04:00.1/sound/card1/input24
[    5.082614] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.0/0000:02:00.0/0000:03:08.0/0000:04:00.1/sound/card1/input25
[    5.082666] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:02:00.0/0000:03:10.0/0000:05:00.1/sound/card2/input22
[    6.893867] nvidia-modeset: Allocated GPU:0 (GPU-8d95715f-6f13-b43f-cf13-a49a24d5a88b) @ PCI:0000:04:00.0
[    6.894342] nvidia-modeset: Allocated GPU:1 (GPU-36f5b8ae-6b82-7edd-67f0-e5b88c16adc5) @ PCI:0000:05:00.0

[UPDATED] : output in pastebin: Output: nvidia-smi -q

cyberwillis commented 7 years ago

Hi again,

Also I don't found 'nvidia-docker-plugin' on the system. I installed nvidia-docker2 from binary package but appear it wasn't installed.

flx42 commented 7 years ago

There is no more nvidia-docker-plugin with v2.

cyberwillis commented 7 years ago

I compiled your code snipped suggested on the #206

$ ./nvml_crash
terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetTopologyCommonAncestor(dev1, dev2, &topo) error: 999
[1]    20176 abort (core dumped)  ./nvml_crash

3XX0 commented 7 years ago

Indeed, looks like the nvml bug. Can you try to launch a cuda sample (e.g. deviceQuery) see if that works

docker run -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1 \ 
             --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04

# apt-get update && apt-get install --no-install-recommends cuda-samples-8-0
# cd /usr/local/cuda/samples/1_Utilities/deviceQuery && make
# ./deviceQuery

cyberwillis commented 7 years ago

Hi again,

I did the following tests:

I got a strange behaviour running nvidia-smi by setting the environment variable NVIDIA_VISIBLE_DEVICES to values (0, 1 or all) in two different cases:

by running the command from the terminal only
running from the terminal on desktop environment.

I have noticed that setting the NVIDIA_VISIBLE_DEVICES=1 from the "terminal only" force the nvidia-smi respond as if 0 was setted, but doing the the same thing from terminal in desktop environment (lightdm activated) it causes the "unknonw error". Weird.

So I decided to look to new drivers and updated from nvidia-384.90 (native in cuda 9.0) to nvidia-384.98 released 11/02/2017, but nothing changed.

Then I executed the scripts you suggested also switching the environments, here are the results:

NVIDIA_VISIBLE_DEVICES=0

from the terminal only and from terminal in desktop environment, both gave me the same result

$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 690"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1997 MBytes (2094202880 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Max Clock rate:                            1020 MHz (1.02 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 690
Result = PASS

NVIDIA_VISIBLE_DEVICES=1

See that from terminal in desktop environment, the device 1 fail

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

See that from TERMINAL ONLY, the device 0 is returned

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 690"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1999 MBytes (2096300032 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Max Clock rate:                            1020 MHz (1.02 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 5 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 690
Result = PASS

NVIDIA_VISIBLE_DEVICES=all

See that from terminal ONLY and from terminal in desktop environment, both devices are returned

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX 690"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2046 MBytes (2145189888 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Max Clock rate:                            1020 MHz (1.02 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 690"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2046 MBytes (2145189888 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Max Clock rate:                            1020 MHz (1.02 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 5 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 690 (GPU0) -> GeForce GTX 690 (GPU1) : No
> Peer access from GeForce GTX 690 (GPU1) -> GeForce GTX 690 (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 2, Device0 = GeForce GTX 690, Device1 = GeForce GTX 690
Result = PASS

I didn't tested yet with earlier driver (nvidia-375) , but I tested the cuda-8.0 with the recent driver nvidia-384.90 and got the same behaviour.

3XX0 commented 7 years ago

Can you provide:

The log output of nvidia-container-runtime edit /etc/nvidia-container-runtime/config.toml, uncomment debug=... and run the container (drag and drop the file in argument here)
The output of findmnt and cat /sys/fs/cgroup/devices/devices.list inside the container that fails
The output ofnvidia-smi -q and nvidia-smi outside the container after reproducing the failure.

cyberwillis commented 6 years ago

Hi again, sorry about the delay to get back to you. So, using the current driver (384.90)

1. The log output of nvidia-container-runtime

1.1 - trying to access only device 0 of GTX 690:

sudo rm /var/log/nvidia-container-runtime-hook.log && \ 
docker run -it \
           --runtime=nvidia \
           -e NVIDIA_VISIBLE_DEVICES=0 \
           --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 \
           nvidia-smi && \
cat /var/log/nvidia-container-runtime-hook.log > ~/dump-log-device0.txt
Tue Nov 14 15:02:32 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 33%   45C    P8    N/A /  N/A |    884MiB /  2045MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Generated content file : dump-log-device0.txt

1.2 - trying to access only device 1 of GTX 690:

sudo rm /var/log/nvidia-container-runtime-hook.log && \  
docker run -it \
           --runtime=nvidia \
           -e NVIDIA_VISIBLE_DEVICES=1 \ 
           --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 \
           nvidia-smi && \
cat /var/log/nvidia-container-runtime-hook.log > ~/dump-log-device1.txt
Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error

Generated content file : dump-log-device1.txt

2. The output of findmnt and cat /sys/fs/cgroup/devices/devices.list inside the container that fails

One strange behaviour here is , if I try to disable lightdm and run everything on the terminal only, trying to access the device 1 is auto substitutes to the device 0 ! The same behaviour exists in the new driver 384.98, but I couldn't install sucessfully the earlier driver 375.xx Nvidia doesn't allow me ( I tried to block instalation of the new ones without success)

2.1 - findmnt

sh -c "docker run -it \
        --runtime=nvidia \
        -e NVIDIA_VISIBLE_DEVICES=1 \
        --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 \
        findmnt;" > ~/dump-findmnt-device1.txt

Generated content file : dump-findmnt-device1.txt

2.1 - cat /sys/fs/cgroup/devices/devices.list

sh -c "docker run -it \
        --runtime=nvidia \
        -e NVIDIA_VISIBLE_DEVICES=1 \
        --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 \
        cat /sys/fs/cgroup/devices/devices.list;" > ~/dump-device-list-device1.txt

Generated content file : dump-findmnt-device1.txt

3. nvidia-smi outside the container

sh -c "nvidia-smi && nvidia-smi -q;" > ~/dump-nvidia-smi-host.txt
Generated content file : dump-nvidia-smi-host.txt

3XX0 commented 6 years ago

Can you double check that le log is indeed not there, this shouldn't happen. Also it seems like you ran nvidia-smi from within the container not the host.

And device 1 is not substituted, devices are renumbered inside the container, that's expected.

cyberwillis commented 6 years ago

Hi again, You right, sorry about that ! I did captured the log file manually this time.

Thank you

sudo rm /var/log/nvidia-container-runtime-hook.log

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=1 \
            --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 nvidia-smi 

cat /var/log/nvidia-container-runtime-hook.log > ~/dump-log-device1.txt

Generated content file : dump-log-device1.txt

Can you double check that le log is indeed not there, this shouldn't happen. Also it seems like you ran nvidia-smi from within the container not the host.

Yeah I remember this thing bein populated in the past , in other cuda version, My Cuda Tookit installation came from cuda 9 local repository deb file option of Nvidia.

sh -c "nvidia-smi && nvidia-smi -q;"  > ~/new-smi-from-host.txt

content file : new-dump-smi-from-host.txt

Do you know any way I can block the new driver to be installed ? Then I could jumpt to cuda 8.0 with the earlier drivers and do the tests again...

3XX0 commented 6 years ago

I'm afraid this is the same driver issue as #206. Can you launch the container with NVIDIA_VISIBLE_DEVICES=all, run the following program and give me back the output?

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=all \
            --rm nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04

cat > sample.cu <<EOF
#include <cuda.h>
#include <stdio.h>
#include <assert.h>

int main()
{
    CUdevice dev;
    int n;

    assert(cuInit(0) == CUDA_SUCCESS);
    assert(cuDeviceGet(&dev, 0) == CUDA_SUCCESS);
    assert(cuDeviceGetAttribute(&n, CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD, dev) == CUDA_SUCCESS);
    printf("%d\n", n);
}
EOF

nvcc sample.cu -lcuda
./a.out

cyberwillis commented 6 years ago

Hi... Yes I believe also that is a driver problem The output of the script was:

root@5a2d07ac7658:/# ./a.out
1

3XX0 commented 6 years ago

Interesting, can you try all the other configuration to see if the result is similar (0, 1, with and without desktop)

cyberwillis commented 6 years ago

Sure,

I generated a container based in Nvidia/Cuda

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
ADD sample.cu /
RUN nvcc sample.cu -lcuda
CMD nvidia-smi && /a.out

Then I executed the six cases with nvidia-smi && /a.out :

Look that at terminal 1 it not fail but switch to GPU 0 (in last output).

Desktop-all

Wed Nov 15 02:22:48 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 36%   50C    P8    N/A /  N/A |    551MiB /  2045MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 34%   46C    P8    N/A /  N/A |    551MiB /  2045MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+
1

Desktop-0

Wed Nov 15 02:22:57 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 36%   50C    P8    N/A /  N/A |    551MiB /  2045MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+
1

Desktop-1

Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error

Terminal-all

Wed Nov 15 02:20:14 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 37%   53C    P0    N/A /  N/A |      0MiB /  1997MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 34%   48C    P0    N/A /  N/A |      0MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+
1

Terminal-0

Wed Nov 15 02:20:02 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 37%   53C    P0    N/A /  N/A |      0MiB /  1997MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+
1

Terminal-1

Wed Nov 15 02:19:50 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 33%   48C    P0    N/A /  N/A |      0MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+
1

[UPDATED]: Desktop-1 was incorrect

3XX0 commented 6 years ago

Does the sample work for Desktop-1? It's not being executed because of your &&

cyberwillis commented 6 years ago

Hi again, Sorry about the confusion

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
ADD sample.cu /
RUN nvcc sample.cu -lcuda
CMD /a.out

$ docker run -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1 --rm cyberwillis/debug:latest ./a.out Result:

a.out: sample.cu:10: int main(): Assertion `cuInit(0) == CUDA_SUCCESS' failed.

cyberwillis commented 6 years ago

I believe I found a way to roll back to an older driver I will try now!

[UPDATE]: Drivers I tested I just installed this drivers and used the generated container to try to access the exact GPU 1

NVIDIA-Linux-x86_64-304.137.run (nvidia-smi forces GPU fan and never stops its process) NVIDIA-Linux-x86_64-340.104.run (claims that needs Cuda >= 8.0 by the container) NVIDIA-Linux-x86_64-370.28.run (same behaviour) NVIDIA-Linux-x86_64-375.82.run (same behaviour) NVIDIA-Linux-x86_64-384.98.run (same behaviour) NVIDIA-Linux-x86_64-387.12.run (same behaviour)

I conclude that is a bug never solved in GTX 690 or a BIOS related problem.

3XX0 commented 6 years ago

Yes, I will report the bug internally and will update this issue once we know more about it.

RenaudWasTaken commented 5 years ago

Hello!

Thanks for opening this issue. Looking at this after some time, it looks like the internal bug was closed soon after. This should have been fixed in recent (or even a sightly older releases).

cyberwillis commented 5 years ago

Sorry to disapoint. I still have this card on other machine and it still has the same problem I am using the driver 415.27 already

RenaudWasTaken commented 5 years ago

This is unfortunate, I'll re-open the bug internally.

RenaudWasTaken commented 5 years ago

Hello @cyberwillis !

We are trying to get a repro of this bug internally but are having a hard time getting our hands on a GTX 690.

Do you think you could hand us a log of the the nvml crash? To do this, you need to run the NVML application with these environment variables set:

__NVML_DBG_LVL=DEBUG
__NVML_DBG_FILE=/tmp/nvml.log

Thanks!

cyberwillis commented 5 years ago

Hi @RenaudWasTaken , sorry I am late to answer your question but here it is. Thank you

Accessing the cards all (0, 1)

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=all \
            -e __NVML_DBG_LVL=DEBUG \
            -e __NVML_DBG_FILE=/tmp/nvml.log \
            --name cuda10 \
            --rm \
            nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

# nvidia-smi

Accessing the cards all (0,1)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 35%   49C    P8    N/A /  N/A |    690MiB /  1998MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 33%   46C    P8    N/A /  N/A |    690MiB /  1998MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

cardall-nvml.log.tar.gz

Accessing the card 0

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=0 \
            -e __NVML_DBG_LVL=DEBUG \
            -e __NVML_DBG_FILE=/tmp/nvml.log \
            --name cuda10 \
            --rm \
            nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

# nvidia-smi

Answer: card0

Tue Feb 26 13:19:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 35%   48C    P8    N/A /  N/A |    690MiB /  1998MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

card0-nvml.log.tar.gz

Accessing the card 1

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=1 \
            -e __NVML_DBG_LVL=DEBUG \
            -e __NVML_DBG_FILE=/tmp/nvml.log \
            --name cuda10 \
            --rm \
            nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

# nvidia-smi

Answer: card1

Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error

card1-nvml.log.tar.gz

cyberwillis commented 5 years ago

@RenaudWasTaken Now running the following code on each container: Code

cat <<EOF | tee nvml.cc
#include <stdexcept>
#include <nvml.h>

#define NVML_CALL(call) \
    do { \
        nvmlReturn_t ret = call; \
        if (ret != NVML_SUCCESS) throw std::runtime_error(std::string(#call) + " error: " + std::to_string(ret)); \
    } while (0)

int main()
{
    NVML_CALL(nvmlInit());
    nvmlDevice_t dev1, dev2;
    NVML_CALL(nvmlDeviceGetHandleByIndex(0, &dev1));
    NVML_CALL(nvmlDeviceGetHandleByIndex(1, &dev2));
    nvmlGpuTopologyLevel_t topo;
    NVML_CALL(nvmlDeviceGetTopologyCommonAncestor(dev1, dev2, &topo));
}
EOF

g++ -std=c++11 -I /usr/local/cuda/include nvml.cc -lnvidia-ml -o nvml

./nvml

Cards: ALL

RESULT

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetTopologyCommonAncestor(dev1, dev2, &topo) error: 3
Aborted (core dumped)

nvml.cc-cardall-nvml.log.tar.gz

Card: 0

RESULT

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetHandleByIndex(1, &dev2) error: 2
Aborted (core dumped)

nvml.cc-card0-nvml.log.tar.gz

Card: 1

RESULT

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetHandleByIndex(0, &dev1) error: 999
Aborted (core dumped)

nvml.cc-card1-nvml.log.tar.gz

RenaudWasTaken commented 5 years ago

@cyberwillis it seems like your card is an SLI slave. The easy workaround for this would be to unlink the SLI device if you need to pass the devices to different containers.

Let me know if this works for you :)

cyberwillis commented 5 years ago

Hi @RenaudWasTaken, Thank you for you fastest reply.

I believe that there is no way to UNSLI the GTX 690 it's a card that has two 680 in SLI mode by default. Before even do that experiments I executed this line and restarted the computer just to make sure:

sudo nvidia-xconfig --multigpu=off --sli=off

Altough my configuration after and before the restart, when the Xserver is active is the following:

Using X configuration file: "/etc/X11/xorg.conf".

    ServerLayout "Layout0"
        |
        |--> Screen "Screen0"
        |       |
        |       |--> Monitor "Monitor0"
        |       |       |
        |       |       |--> VendorName "Unknown"
        |       |       |--> ModelName "DELL U2312HM"
        |       |       |--> HorizSync  30.0-83.0
        |       |       |--> VertRefresh  56.0-76.0
        |       |       |--> Option "DPMS"
        |       |
        |       |--> Device "Device0"
        |       |       |--> Driver "nvidia"
        |       |       |--> VendorName "NVIDIA Corporation"
        |       |       |--> BoardName "GeForce GTX 690"
        |       |
        |       |--> Option "Coolbits" "4"
        |       |--> Option "Stereo" "0"
        |       |--> Option "nvidiaXineramaInfoOrder" "DFP-0"
        |       |--> Option "metamodes" "GPU-8d95715f-6f13-b43f-cf13-a49a24d5a88b.GPU-0.DVI-I-1: nvidia-auto-select +0+0, GPU-8d95715f-6f13-b43f-cf13-a49a24d5a88b.GPU-0.DVI-D-0: nvidia-auto-select +1920+0, GPU-36f5b8ae-6b82-7edd-67f0-e5b88c16adc5.GPU-1.DVI-I-1: nvidia-auto-select +3840+0"
        |       |--> Option "BaseMosaic" "on"
        |       |--> Option "Clone" "off"
        |       |--> Option "MultiGPU" "off"
        |       |--> Option "SLI" "off"
        |       |--> DefaultColorDepth 24
        |
        |--> InputDevice "Keyboard0"
        |       |
        |       |--> Driver "kbd"
        |       |--> Option "CoreKeyboard"
        |
        |--> InputDevice "Mouse0"
        |       |
        |       |--> Driver "mouse"
        |       |--> Option "Protocol" "auto"
        |       |--> Option "Device" "/dev/psaux"
        |       |--> Option "Emulate3Buttons" "no"
        |       |--> Option "ZAxisMapping" "4 5"
        |       |--> Option "CorePointer"
        |
        |--> Option "Xinerama" "0"

As you can see only the Mosaic is On because I supply 3 monitors here with this board. But that will not affect if I do the experiment with the Display Manager Service stopped (lightdm). So I did the same experiment with the X Display Manager turned off (I lose two monitors). Can you take a look ?

cards all (0, 1)

Result of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 38%   54C    P0    N/A /  N/A |      0MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 35%   50C    P0    N/A /  N/A |      0MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

terminalonly-cardall-nvml.log.tar.gz

Result of nvml

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetTopologyCommonAncestor(dev1, dev2, &topo) error: 3

terminalonly-nvml-cc-cardall-nvml.log.tar.gz

card 0

Result of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 37%   54C    P0    N/A /  N/A |      0MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Note: here the correct address of the card 0 (04:00.0) is showed.

terminalonly-card0-nvml.log.tar.gz

Result of nvml

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetHandleByIndex(1, &dev2) error: 2

terminalonly-nvml-cc-card0-nvml.log.tar.gz

card 1

Result of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 34%   48C    P0    N/A /  N/A |      0MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Note: here the correct address of the card 1 (05:00.0) is showed.

terminalonly-card1-nvml.log.tar.gz

Result of nvml

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvmlDeviceGetHandleByIndex(1, &dev2) error: 2

terminalonly-nvml-cc-card1-nvml.log.tar.gz

it seems like your card is an SLI slave. The easy workaround for this would be to unlink the SLI device if you need to pass the devices to different containers.

There is another way to unlik SLI ? If you know some hard way too let me know too !

cyberwillis commented 5 years ago

@RenaudWasTaken, @flx42 , @3XX0

I believe it's solved ! :) I could replicate the previous scenario from Terminal only but in X Display this time.

Using the Nvidia X Server Settings I turned off Mosaic Mode (Surrounding), next I created a X Screen for each individual display, enabled Xinerama and restarted the machine.

As Renaud said earlier:

This should have been fixed in recent (or even a sightly older releases).

Fun fact: is in the past I did made this configuration and did not worked at that time.

Now I can execute this on Unit:

NVIDIA_VISIBLE_DEVICES=all

Note: look how each GPU here has its own memory consumption. Earlier it was just the same value for both.

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=all \
            -e __NVML_DBG_LVL=DEBUG \
            -e __NVML_DBG_FILE=/tmp/nvml.log \
            --name cuda10 \
            --rm \
            nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

# nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 37%   51C    P8    N/A /  N/A |    868MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 33%   46C    P8    N/A /  N/A |    441MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

NVIDIA_VISIBLE_DEVICES=0

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=0 \
            -e __NVML_DBG_LVL=DEBUG \
            -e __NVML_DBG_FILE=/tmp/nvml.log \
            --name cuda10 \
            --rm \
            nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

# nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:04:00.0 N/A |                  N/A |
| 38%   52C    P8    N/A /  N/A |    868MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

NVIDIA_VISIBLE_DEVICES=1

docker run -it --runtime=nvidia \
            -e NVIDIA_VISIBLE_DEVICES=1 \
            -e __NVML_DBG_LVL=DEBUG \
            -e __NVML_DBG_FILE=/tmp/nvml.log \
            --name cuda10 \
            --rm \
            nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

# nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     Off  | 00000000:05:00.0 N/A |                  N/A |
| 33%   46C    P8    N/A /  N/A |    441MiB /  1999MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Thanks to your advice in disable SLI mode I could remember retry to Enable Xinerama mode and test it. I leave to you, to do any comments or close it.

RenaudWasTaken commented 5 years ago

Woot!