NVIDIA / gpu-monitoring-tools

Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.02k stars 301 forks source link

Custom metrics issue #122

Open PaulYuanJ opened 4 years ago

PaulYuanJ commented 4 years ago

DCGM_FI_DEV_COMPUTE_PIDS{gpu="1", UUID="GPU-XXXXXXXXXXXXX", device="nvidia1"} ERROR - FAILED TO CONVERT TO STRING DCGM_FI_DEV_GRAPHICS_PIDS{gpu="1", UUID="GPU-XXXXXXXXXXXXX", device="nvidia1"} 0 DCGM_FI_DEV_VGPU_PER_PROCESS_UTILIZATION{gpu="1", UUID="GPU-XXXXXXXXXXXXX, device="nvidia1"} ERROR - FAILED TO CONVERT TO STRING

PaulYuanJ commented 4 years ago

my custom csv is : # custom metrics,, DCGM_FI_DEV_ACCOUNTING_DATA,gauge,This field is only supported when the host engine is running as root unless you enable accounting ahead of time. DCGM_FI_DEV_COMPUTE_PIDS,gauge,Compute processes running on the GPU. DCGM_FI_DEV_GRAPHICS_PIDS,gauge,Graphics processes running on the GPU. DCGM_FI_DEV_VGPU_PER_PROCESS_UTILIZATION,gauge,Utilization values for processes running within vGPU VMs using the device.

PaulYuanJ commented 4 years ago

my run command is: # docker run -itd --name dcgm_exporter -v /opt/gpu_exporter/metrics.csv:/etc/dcgm-exporter/default-counters.csv --gpus all --cap-add SYS_ADMIN --rm -p 19400:9400 nvidia/dcgm-exporter:2.0.13-2.1.0-ubuntu18.04

pkclyoni commented 3 years ago

have you managed to solve this? am having the same issue

matejzero commented 3 years ago

Same issues here with DCGM_FI_DEV_COMPUTE_PIDS running exporter from commit id 5124c9132096db52054a65281ed7abf224370cd1.