Hi, I'm trying to set GPU monitoring via Grafana/Prometheus. I have stand alone server with two GPUs and use dcgm-exporter in docker container as metrics exporter. I run docker in privileged mode by command docker run -d -e --priveleged -v /home/dockeradm/nvidia-smi-exporter/default-counters.csv:/etc/dcgm-exporter/default-counters.csv -p9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:2.0.13-2.1.2-ubuntu18.04 , and it see my GPUs. But it can't detect GPU processes and GPU Memory Usage.
There is output of nvidia-smi util from host
]$ nvidia-smi
Mon Aug 23 23:03:29 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:37:00.0 Off | 0 |
| N/A 60C P0 42W / 250W | 1393MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 64C P0 47W / 250W | 10095MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 17748 C ...189c/arasov/bin/python3.7 0MiB |
| 0 N/A N/A 53799 C ...189c/arasov/bin/python3.7 1389MiB |
| 1 N/A N/A 17748 C ...189c/arasov/bin/python3.7 10091MiB |
| 1 N/A N/A 53799 C ...189c/arasov/bin/python3.7 0MiB |
+-----------------------------------------------------------------------------+
and there is the output of nvidia-smi inside the container
root@ccdc999ac0bd:/# nvidia-smi
Mon Aug 23 19:25:22 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:37:00.0 Off | 0 |
| N/A 59C P0 41W / 250W | 1393MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 62C P0 46W / 250W | 10095MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Am I missing something or doing something wrong? How should I set container to make it detect GPU processes and GPU usage?
Hi, I'm trying to set GPU monitoring via Grafana/Prometheus. I have stand alone server with two GPUs and use dcgm-exporter in docker container as metrics exporter. I run docker in privileged mode by command
docker run -d -e --priveleged -v /home/dockeradm/nvidia-smi-exporter/default-counters.csv:/etc/dcgm-exporter/default-counters.csv -p9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:2.0.13-2.1.2-ubuntu18.04
, and it see my GPUs. But it can't detect GPU processes and GPU Memory Usage. There is output of nvidia-smi util from hostand there is the output of nvidia-smi inside the container
Am I missing something or doing something wrong? How should I set container to make it detect GPU processes and GPU usage?