NVIDIA / gpu-monitoring-tools

Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.01k stars 301 forks source link

what is the problem of API version mismatch #165

Closed kentinchen closed 3 years ago

kentinchen commented 3 years ago

time="2021-03-11T18:34:24+08:00" level=info msg="Starting dcgm-exporter" time="2021-03-11T18:34:25+08:00" level=info msg="DCGM successfully initialized!" time="2021-03-11T18:34:25+08:00" level=info msg="Not collecting DCP metrics: Error getting supported metrics: Profiling is not supported for this group of GPUs or GPU" time="2021-03-11T18:34:25+08:00" level=fatal msg="Error retrieving DCGM MIG hierarchy: API version mismatch"

dcgmi --version

dcgmi version: 2.0.13

nv-hostengine --version

Version : 2.0.13 Build ID : 18 Build Date : 2020-09-29 Build Type : Release Commit ID : v2.0.12-6-gbf6e6238 Branch Name : rel_dcgm_2_0 CPU Arch : x86_64 Build Platform : Linux 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64

nvidia-smi -L

GPU 0: GeForce RTX 2080 Ti (UUID: GPU-5119bba2-0667-ff67-25c4-f533e958b83c)

nvidia-smi

Thu Mar 11 18:37:55 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.28 Driver Version: 455.28 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:08:00.0 Off | N/A | | 26% 32C P0 32W / 257W | 0MiB / 11018MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

dbeer commented 3 years ago

What version of dcgm-exporter are you running?

kentinchen commented 3 years ago

git log

commit d08ea3cdcce49498e9f7bab532df4d75351fdc0e (HEAD -> master, origin/master, origin/HEAD) I build from source.

kentinchen commented 3 years ago

https://github.com/NVIDIA/gpu-monitoring-tools/issues/155