issues
search
NVIDIA
/
dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923
stars
159
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
#371
15234660879
opened
3 months ago
0
dcp metrics supports gpu architecture
#370
lxzjd
closed
3 months ago
4
MIG device support for hpc_job metric labels
#369
jbrobstw
opened
3 months ago
4
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
#368
15234660879
opened
3 months ago
3
Let dcgm-exporter be a daemon
#367
zvonkok
opened
3 months ago
5
DCGM-Exporter release version 3.3.7-3.5.0
#366
glowkey
closed
3 months ago
0
Can't collecting DCP metrics
#365
jeffreyyjp
opened
3 months ago
4
DCGM exporter image vulnerable to https://nvd.nist.gov/vuln/detail/CVE-2024-24790
#364
alexglenn-ddl
opened
3 months ago
1
dcgm-exporter dont show metrics from other namespaces and pods k8s
#363
hive74
opened
4 months ago
12
dcgm-exporter log: No Kubelet socket, ignoring
#362
jeffreyyjp
closed
3 months ago
2
Protobuf handling is incorrect
#361
fbacchella
opened
4 months ago
2
dcgm-exporter crashes when run on Debian 12
#360
stevenmcastano
closed
4 months ago
1
Make nvidia resource names configurable
#359
lx1036
closed
4 months ago
1
README link about "To integrate DCGM-Exporter with Prometheus and Grafana, see the full instructions in the user guide." is already invalid
#358
jeffreyyjp
closed
4 months ago
1
Rename default PCIe metrics for better readability
#357
koshieguchi
closed
4 months ago
1
Seeking community feedback on potential new feature: Standardize labels for next major release
#356
glowkey
opened
4 months ago
6
[dashboard] Rework dashboard (MIG support, Grafana deprecations, Hostname)
#355
frittentheke
opened
4 months ago
0
Why `DCGM_FI_DEV_PCIE_{TX,RX}_THROUGHPUT` is default instead of `DCGM_FI_PROF_PCIE_{TX,RX}_BYTES `?
#354
koshieguchi
closed
4 months ago
2
Duplicated, missing or wrong metrics if using MIG, Grafana dashboard showing wrong duplicated / false values
#353
frittentheke
opened
4 months ago
2
cannot get DCGM_FI_PROF_SM_ACTIVE metrics
#352
qingfenghcy
opened
4 months ago
1
[Helm] Enable custom metrics, mount ConfigMap by default
#351
chipzoller
closed
3 months ago
32
[Helm] Enable ConfigMap mount by default
#350
chipzoller
closed
3 months ago
8
enable DCGM_EXPORTER_KUBERNETES and podrequestapi is avaiable but not found container and namespace label in Metrics
#349
Kevinz857
closed
4 months ago
4
GPU Failure Detection and Alerting Enhancement
#348
jz543fm
opened
4 months ago
14
Cannot Retrieve GPU PIDs from DCGM Metrics
#347
doronkg
closed
4 months ago
4
fix: correct metric help text
#346
pintohutch
closed
4 months ago
1
DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2
#345
xuchenCN
closed
4 months ago
3
How to install dcgm-exporter on Windows Server?
#344
LittleNewton
closed
5 months ago
6
How to obtain the namespace , pod and container data
#343
aikikia
closed
4 months ago
6
`namespace` and `pod` labels are sometimes missing from metrics
#342
Altair-Bueno
opened
5 months ago
16
Switch GPU Util metric to `DCGM_FI_PROF_GR_ENGINE_ACTIVE` in NVIDIA DCGM Metrics Dashboard
#341
wabouhamad
opened
5 months ago
0
exported_pod cause issue with query -> every sample a different metrics
#340
amir-bialek
opened
5 months ago
3
can I get computeRunningProcesses and graphicsRunningProcesses this two metrics??
#339
suxwang
closed
5 months ago
1
config csv DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS, but cannot get on metrics
#338
suxwang
closed
5 months ago
2
I can't get the following metrics, but I've set the environment variable
#337
kameriso-zga
closed
5 months ago
6
nvlink metrics are not available on the gh200 gpu node
#336
AnjirwalaAnuj
opened
5 months ago
2
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed
#335
jjziets
closed
5 months ago
2
Hello, why /var/log/nv-hostengine.log file had many ERROR [5231:5273] [[NvSwitch]] ReadNvSwitchStatusAllSwitches()
#334
13416157913
closed
5 months ago
1
Github Issue 331
#333
nvvfedorov
closed
5 months ago
0
Update Makefile
#332
jjziets
opened
6 months ago
0
Makefile missing DIST_DIR := cmd/dcgm-exporter
#331
jjziets
closed
5 months ago
1
Failed to watch metrics: Error watching fields: The third-party Profiling module returned an u
#330
287400117
opened
6 months ago
2
Could not enable kubernetes metric collection: nvml: Unknown Error
#329
287400117
opened
6 months ago
2
Profiling module failed to load
#328
hkominos
opened
6 months ago
5
hello,I use docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.6-3.4.2-ubuntu22.04 to start the container and an error message readlink: missing operand
#327
nvvfedorov
opened
6 months ago
5
feat: add pci_bus_id label for metrics
#326
fungaren
closed
5 months ago
5
DCGM Exporter Release 3.3.6-3.4.2
#325
rohit-arora-dev
closed
6 months ago
0
Executing dcgmi diag -r 3 in dcgm-exporter, the prompt shows "nvvs binary was not found"
#324
287400117
closed
6 months ago
1
Cannot build from source via Ansible
#323
Godson-A
opened
6 months ago
4
how to query rated power?
#322
wade-liwei
opened
6 months ago
1
Previous
Next