NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.2k stars 2.03k forks source link

VolumeDriver.Create: internal error #188

Closed HoloSound closed 8 years ago

HoloSound commented 8 years ago

Regarding https://github.com/NVIDIA/nvidia-docker/issues/34 I found some commands which will not work in my configuration:

First some system information:

# uname -a
Linux studio16 4.2.0-42-lowlatency #49-Ubuntu SMP PREEMPT Tue Jun 28 23:12:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
# nvidia-smi
Fri Sep  2 15:37:59 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   33C    P8    N/A /  N/A |     86MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

Addition Information:

I had a runable version with 352.93 - but the package update to 352.99 caused conistency troubles.

/var/log/kern.log

Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: API mismatch: the client has the version 352.99, but
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: this kernel module has the version 352.93.  Please
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: make sure that this kernel module and all NVIDIA driver
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: components have the same version.
Aug 23 19:48:44 studio16 kernel: [2149830.211535] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22

So I tried to deinstall everything - and reinstalled 367.44.

... and I pulled nvidia-docker code and compiled it with make.

(additionally make deb + dpkg -i ....deb )

with:

# nvidia-docker --version
Docker version 1.12.1, build 23cf638

Is there a possibility to get the nvidia-docker - not the docker - version information?

But:

# nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: create nvidia_driver_367.44: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

From issue/34 I thought to take the commands:

nvidia-docker volume setup

does not work:

# nvidia-docker volume setup

Usage:  docker volume COMMAND

Manage Docker volumes

Options:
      --help   Print usage

Commands:
  create      Create a volume
  inspect     Display detailed information on one or more volumes
  ls          List volumes
  rm          Remove one or more volumes

Run 'docker volume COMMAND --help' for more information on a command.
root@studio16:/var/log/upstart# type nvidia-docker
nvidia-docker gehasht ergibt (/usr/local/bin/nvidia-docker)
root@studio16:/var/log/upstart# ls -l /usr/local/bin/nvidia-docker
-rwxr-xr-x 1 root root 6784176 Sep  2 17:11 /usr/local/bin/nvidia-docker
root@studio16:/var/log/upstart# 

Did the flags change in one of the last versions of docker ?

root@studio16:~# nvidia-docker volume ls
DRIVER              VOLUME NAME
local               nvidia_driver_352.93
root@studio16:~# 

... this volume is the regards to the previous running version!

Additionally I thought to get some information in

/var/log/upstart/nvidia-docker.log

# ls -l /var/log/upstart
insgesamt 0
#

How can I generate a volume name regarding to my actual version 367.44?

atrophiedbrain commented 8 years ago

I am having a very similar problem on Ubuntu 16.04. I have tried different versions of the nvidia driver, each on a fresh install of Ubuntu. The same error message always persists:

docker: Error response from daemon: create nvidia_driver_367.44: VolumeDriver.Create: internal error, check logs for details. See 'docker run --help'.

I am currently getting:

docker: Error response from daemon: create nvidia_driver_364.19: VolumeDriver.Create: internal error, check logs for details. See 'docker run --help'.

I do not have a /var/log/upstart/nvidia-docker.log file either.

However, I see the following in /var/log/kern.log:

kernel: [ 1698.375241] aufs au_opts_verify:1597:docker[8045]: dirperm1 breaks the protection by the permission bits on the lower branch

Do you see this error in your /var/log/kern.log?

I wonder if we are having the same problem as the poster BLACKY_001 here: https://devtalk.nvidia.com/default/topic/960139/toubles-at-ubuntu-update/?offset=6

In addition to trying different versions of the nvidia driver, I've also tried Docker 1.9, 1.11, and 1.12. I have the same problem on each version.

I have made sure to only have a single nvidia driver installed at a time.

HoloSound commented 8 years ago

@atrophiedbrain

NO - I do not get the message "breaks the protection by the permission bits on the lower branch" in kern.log.

Do not wonder - it's also me - because I'm searching in all possible directions. 1) nvidia driver - due version update with new ubuntu package 2) ubuntu driver rollout - maybe instable package rollout (shutting down X-windows during driver update?) 3) nvidia-docker - because of the volume error message Maybe I get a clear picture with all these puzzle-pieces.

Reg. 16.04: I also thought to update to this OS version 16

What do you get with:

docker volume list (do you have a volume with the same name like you have installed?)

and does

nvidia-docker volume setup

work?

3XX0 commented 8 years ago

nvidia-docker volume setup has been removed and should not be used.

Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: API mismatch: the client has the version 352.99, but Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: this kernel module has the version 352.93. Please

You installed a new driver but didn't reboot or reload the module. Make sure your driver works properly on the host (e.g. nvidia-smi)

docker: Error response from daemon: create nvidia_driver_364.19: VolumeDriver.Create: internal error, check logs for details. See 'docker run --help'.

Check the ouptut of the logs:

On ubuntu 14.04 (upstart):

$ cat /var/log/upstart/nvidia-docker.log

If you have systemd (centos 7, ubuntu 16.04):

$ systemctl status nvidia-docker
$ journalctl -n -u nvidia-docker
atrophiedbrain commented 8 years ago

Output of my systemctl status nvidia-docker command:

nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor pr
   Active: active (running) since Sat 2016-09-03 21:00:38 EDT; 50s ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 3021 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docke
  Process: 2996 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (
 Main PID: 2995 (nvidia-docker-p)
    Tasks: 7
   Memory: 22.0M
      CPU: 505ms
   CGroup: /system.slice/nvidia-docker.service
           └─2995 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Sep 03 21:00:38 jm-lab systemd[1]: Starting NVIDIA Docker plugin...
Sep 03 21:00:38 jm-lab systemd[1]: Started NVIDIA Docker plugin.
Sep 03 21:00:38 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:38 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin

Output of my nvidia-smi command:

Sat Sep  3 21:04:03 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 364.19     Driver Version: 364.19         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0      On |                  N/A |
| 14%   53C    P0    56W / 195W |    188MiB /  4094MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      3082    G   /usr/lib/xorg/Xorg                             132MiB |
|    0      4010    G   compiz                                          36MiB |
+-----------------------------------------------------------------------------+
3XX0 commented 8 years ago

What about journalctl -n -u nvidia-docker ?

HoloSound commented 8 years ago

Output of nvidia-smi command:

# nvidia-smi
Sun Sep  4 09:08:35 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   33C    P8    N/A /  N/A |     87MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+
#

Output of journalctl -n -u nvidia-docker:

# journalctl -n -u nvidia-docker
-- Logs begin at Fre 2016-09-02 22:20:36 CEST, end at Son 2016-09-04 09:06:08 CEST. --
Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA unified memory
Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA management library
Sep 02 22:21:03 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:03 Discovering GPU devices
Sep 02 22:21:05 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:05 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving plugin API at /var/lib/nvidia-docker
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving remote API at localhost:3476
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received activate request
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Plugins activated [VolumeDriver]
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received create request for volume 'nvidia_driver_367.44'
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/367.4
# 

with:

# file  nvidia-cuda-mps-control 
nvidia-cuda-mps-control: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.4.0, stripped
# 

but no existing:

# ls -la /var/lib/nvidia-docker/volumes/nvidia_driver/
insgesamt 8
drwxr-xr-x 2 nvidia-docker nvidia-docker 4096 Sep  2 22:22 .
drwxr-xr-x 3 nvidia-docker nvidia-docker 4096 Sep  2 17:23 ..
# 

Output of systemctl status nvidia-docker

# systemctl status nvidia-docker -l
● nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Fre 2016-09-02 22:21:01 CEST; 1 day 10h ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 1367 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
  Process: 1353 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
 Main PID: 1352 (nvidia-docker-p)
   Memory: 21.2M
      CPU: 577ms
   CGroup: /system.slice/nvidia-docker.service
           └─1352 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA unified memory
Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA management library
Sep 02 22:21:03 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:03 Discovering GPU devices
Sep 02 22:21:05 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:05 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving plugin API at /var/lib/nvidia-docker
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving remote API at localhost:3476
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received activate request
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Plugins activated [VolumeDriver]
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received create request for volume 'nvidia_driver_367.44'
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/367.44/bin/nvidia-cuda-mps-control: invalid cross-device link
#
HoloSound commented 8 years ago

@atrophiedbrain

Your output of systemctl status nvidia-docker is cut after 80? columns - necessary / usable information follows to the right!

3XX0 commented 8 years ago

See #133 and this comment in particular

atrophiedbrain commented 8 years ago

Output of nvidia-docker volume ls:

DRIVER VOLUME NAME

output of nvidia-docker run --rm nvidia/cuda nvidia-smi:

docker: Error response from daemon: create nvidia_driver_364.19: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

I have no log file /var/log/upstart/nvidia-docker.log.

Output of my systemctl status nvidia-docker:

nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2016-09-04 19:42:06 EDT; 4h 8min ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 3070 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
  Process: 3046 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
 Main PID: 3045 (nvidia-docker-p)
    Tasks: 7
   Memory: 21.7M
      CPU: 503ms
   CGroup: /system.slice/nvidia-docker.service
           └─3045 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA unified memory
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA management library
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Discovering GPU devices
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving plugin API at /var/lib/nvidia-docker
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving remote API at localhost:3476
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Received activate request
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Plugins activated [VolumeDriver]
Sep 04 23:46:58 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:58 Received create request for volume 'nvidia_driver_364.19'

Output of my journalctl -n -u nvidia-docker:

Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA unified memory
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA management library
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Discovering GPU devices
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving plugin API at /var/lib/nvidia-docker
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving remote API at localhost:3476
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Received activate request
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Plugins activated [VolumeDriver]
Sep 04 23:46:58 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:58 Received create request for volume 'nvidia_driver_364.19'
Sep 04 23:46:58 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:58 Error: link /usr/lib/nvidia-364/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/364.19/bin/nvidia-cuda-mps-control: invalid cross-device link
~
~

Found in /var/log/kern.log:

Sep  4 23:46:58 jm-lab kernel: [  294.327466] aufs au_opts_verify:1597:dockerd[3133]: dirperm1 breaks the protection by the permission bits on the lower branch
Sep  4 23:46:58 jm-lab kernel: [  294.355806] aufs au_opts_verify:1597:dockerd[3133]: dirperm1 breaks the protection by the permission bits on the lower branch
atrophiedbrain commented 8 years ago

I see from https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin#known-limitations that I need to use nvidia-docker-plugin -d to change the volume directory for nvidia-docker-plugin as it has to be on the same mount as the nvidia driver.

I tried nvidia-docker-plugin -d "/usr/nvidia-docker/volumes" but ran into problems with the following message:

nvidia-docker-plugin | 2016/09/05 00:07:18 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/09/05 00:07:18 Loading NVIDIA management library
nvidia-docker-plugin | 2016/09/05 00:07:18 Discovering GPU devices
nvidia-docker-plugin | 2016/09/05 00:07:18 Provisioning volumes at /usr/nvidia-docker/volumes
nvidia-docker-plugin | 2016/09/05 00:07:18 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/09/05 00:07:18 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/09/05 00:07:18 Error: listen tcp 127.0.0.1:3476: bind: address already in use

I followed the instructions at https://github.com/NVIDIA/nvidia-docker/issues/133#issuecomment-234138060 but now received the following message:

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: oci runtime error: exec: "nvidia-smi": executable file not found in $PATH.

And following this post https://github.com/NVIDIA/nvidia-docker/issues/184 I knew I needed to change the permissions so that nvidia-docker user can write/modify inside /usr/local/nvidia-driver but I was not sure what commands to run to do this.

I tried chown nvidia-docker /usr/local/nvidia-driver but saw the following when I ran sudo nvidia-docker run --rm nvidia/cuda nvidia-smi: docker: Error response from daemon: no such volume: nvidia_driver_364.19.

While journalctl -n -u nvidia-docker showed:

Sep 05 00:37:46 jm-lab nvidia-docker-plugin[3037]: /usr/bin/nvidia-docker-plugin | 2016/09/05 00:37:46 Received mount request for volume 'nvidia_driver_364.19'
Sep 05 00:37:47 jm-lab nvidia-docker-plugin[3037]: /usr/bin/nvidia-docker-plugin | 2016/09/05 00:37:47 Received unmount request for volume 'nvidia_driver_364.19'

However, after restarting it worked!

Output of sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

Mon Sep  5 04:41:46 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 364.19     Driver Version: 364.19         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0      On |                  N/A |
|  0%   43C    P0    54W / 195W |    171MiB /  4094MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Thank you all for your help!

tl;dr: I was having the same problem as these posts: https://github.com/NVIDIA/nvidia-docker/issues/133#issuecomment-234138060 https://github.com/NVIDIA/nvidia-docker/issues/184 and mentioned https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin#known-limitations

HoloSound commented 8 years ago

Made:

1) apt-get purge nvidia-docker 2) mkdir /usr/local/nvidia-driver 2.5) chown nvidia-docker:nvidia-docker /usr/local/nvidia-driver <---- !!!! 3) cd compile/nvidia-docker 4) git pull 5) make deb 6) dpkg -i ./tools/dist/nvidia-docker_1.0.0~rc.3-1_amd64.deb 7) systemctl edit nvidia-docker with #133 8) reboot

Result:

$ nvidia-smi
Mon Sep  5 16:43:38 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   34C    P8    N/A /  N/A |     72MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+
$
$ dpkg -l nvidia-docker
Gewünscht=Unbekannt/Installieren/R=Entfernen/P=Vollständig Löschen/Halten
| Status=Nicht/Installiert/Config/U=Entpackt/halb konFiguriert/
         Halb installiert/Trigger erWartet/Trigger anhängig
|/ Fehler?=(kein)/R=Neuinstallation notwendig (Status, Fehler: GROSS=schlecht)
||/ Name                                       Version                    Architektur                Beschreibung
+++-==========================================-==========================-==========================-=========================================================================================
ii  nvidia-docker                              1.0.0~rc.3-1               amd64                      NVIDIA Docker container tools
$

$ journalctl -n -u nvidia-docker -l
-- Logs begin at Mon 2016-09-05 16:35:52 CEST, end at Mon 2016-09-05 16:47:19 CEST. --
Sep 05 16:47:19 studio16 nvidia-docker-plugin[1350]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Successfully terminated
Sep 05 16:47:19 studio16 systemd[1]: Stopped NVIDIA Docker plugin.
Sep 05 16:47:19 studio16 systemd[1]: Starting NVIDIA Docker plugin...
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Loading NVIDIA unified memory
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Loading NVIDIA management library
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Discovering GPU devices
Sep 05 16:47:19 studio16 systemd[1]: Started NVIDIA Docker plugin.
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Provisioning volumes at /usr/local/nvidia-driver
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Serving plugin API at /var/lib/nvidia-docker
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Serving remote API at localhost:3476
$

nvidia-docker uses /usr/local/nvidia-driver (on the same physical partition like #133 )

AFTERWARDS:

$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
Mon Sep  5 15:13:40 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   35C    P8    N/A /  N/A |     72MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+
holosound@studio16:~$

.... IT WORKED !

May thanks!