Closed mathieu-b closed 4 years ago
Have you installed nvidia drivers on the host system? If so, how did you accomplish that? (There are a couple of ways, but I'd recommend adding the graphics-drivers ppa). Can you execute nvidia-smi on the host system? Have you installed the nvidia-container-toolkit? Are you using docker run or docker-compose?
Hi
Here goes some info:
Docker engine version:
$ docker --version
Docker version 18.06.2-ce, build 6d37f41
nvidia-smi
on host machine:
$ nvidia-smi
Tue Nov 12 13:10:42 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 44% 64C P2 115W / 250W | 3439MiB / 10989MiB | 19% Default |
+-------------------------------+----------------------+----------------------+
Docker runtime:
$ docker info | grep "Runtime"
Runtimes: nvidia runc
Default Runtime: nvidia
nvidia-smi
in container:
$ docker container run nvidia/cuda:10.1-devel-ubuntu16.04 nvidia-smi
Tue Nov 12 12:14:08 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 44% 64C P2 113W / 250W | 3439MiB / 10989MiB | 22% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
The system was installed and configured by another person, however what I know is:
nvidia-docker2
to work, the above exact version of the Docker engine had to be used.I see that in the main page of the GitHub repository, NVIDIA seems to have updated their "main" instructions for a more recent version of the Docker Engine, and it looks like they deprecated these "old" instructions:
Maybe a newer version / updated installation might fix the issue...
Regards
It does seem similar to this issue raised on the nvidia-docker package: https://github.com/NVIDIA/nvidia-docker/issues/854
I'd recommend updating docker, nvidia drivers, and nvidia-docker/nvidia-docker-toolkit. If you're using docker run
, a separate runtime is not required since docker v19.03. See the Docker 19.03 + nvidia-container-toolkit example.
I see, thanks for the heads-up. I'm not sure how soon I'll be able to test the newer version and instructions. If that happens, I'll try to report back in this thread.
Regards
going to close this issue but feel free to open up another if you have troubles after updating.
Hello
first of all, thanks for figuring out a way to have NVIDIA GPU benchmarking working by just extending the base netdata image :pray:
I followed the instructions as reported on the DockerHub page. I can start the container , and then access the webserver running at :19999. However, I can't see any section hinting at a GPU / nvidia-smi benchmarking.
Not seeing any stats, I thought that maybe there was some issue with the execution of
nvidia-smi
(if they use it internally in netdata).I tried executing
nvidia-smi
in the container:but received this error:
The only way that I found for having
nvidia-smi
successfully executing viadocker exec
was the following:Any clues about how this issue could be solved?
Maybe I'll try to give a peek at netdata's sources to see if I can "patch" the system (supposing that the solution is indeed using
LD_PRELOAD
).Best regards.
Best regards.