issues
search
NVIDIA
/
dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
864
stars
153
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bug with DCGM_FI_DEV_VGPU_INSTANCE_IDS metric
#394
Deezzir
closed
4 days ago
6
dcgm-exporter daemonset Startup error Failed to pass the health check
#393
guoliangmiao
opened
5 days ago
1
In the case of gpu pass-through, does dcgm-exporter on the physical host support capturing gpu metrics of kvm virtual machines?
#392
lddlww
opened
1 week ago
0
Service monitor API value configurable
#391
dtzar
opened
1 week ago
0
DCGM-Exporter release 3.3.8-3.6.0
#390
glowkey
closed
1 week ago
0
Missing 3.3.8 builds
#389
xnox
closed
1 week ago
2
DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes
#388
valafon
closed
2 weeks ago
1
DCGM Exporter in EKS p4d.24xlarge instance type controller error
#387
camilopaezrios
opened
3 weeks ago
0
DCGM Exporter in EKS p4d.24xlarge instance type controller error
#386
camilopaezrios
opened
3 weeks ago
0
DCGM-exporter pods stuck in Running State, Not getting Ready without GPU allocation.
#385
rohitreddy1698
opened
4 weeks ago
6
Add a health status metric for every gpu card
#384
lx1036
opened
1 month ago
1
How does the DCGM exporter work with DCGM?
#383
changhyuni
closed
3 weeks ago
3
fix: edit gitignore and require dir & file
#382
kschoi93
closed
1 month ago
6
Error with "make binary" operation in local development
#381
kschoi93
opened
1 month ago
0
No DCGM_FI_DEV_FB_FREE reported for MIG-enabled GPUs
#380
george-kuanli-peng
opened
1 month ago
0
Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
#379
Vijaygawate
opened
1 month ago
2
failed to transform metrics for transform 'podMapper'
#378
jicki
opened
1 month ago
0
How does dcgm-exporter, when running on k8s as a daemonset, communicate with the host's dcgm host engine?
#377
yx-lamini
opened
1 month ago
0
Update contribution doc to require signing
#376
chipzoller
opened
1 month ago
0
Allow selecting the service's ClusterIP
#375
remram44
opened
1 month ago
0
Rename 'secuity' to 'security'
#374
remram44
opened
1 month ago
1
The pod and namespace information in the monitoring indicators of some Gpus occupied by Pods is empty
#373
qingfenghcy
opened
1 month ago
0
time="2024-08-08T03:09:05Z" level=error msg="Failed to write response." error="write tcp 10.202.3.1:9400->10.202.2.2:49674: i/o timeout
#372
safeAndSound3
opened
1 month ago
0
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
#371
15234660879
opened
2 months ago
0
dcp metrics supports gpu architecture
#370
lxzjd
closed
1 month ago
4
MIG device support for hpc_job metric labels
#369
jbrobstw
opened
2 months ago
4
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
#368
15234660879
opened
2 months ago
3
Let dcgm-exporter be a daemon
#367
zvonkok
opened
2 months ago
5
DCGM-Exporter release version 3.3.7-3.5.0
#366
glowkey
closed
2 months ago
0
Can't collecting DCP metrics
#365
jeffreyyjp
opened
2 months ago
4
DCGM exporter image vulnerable to https://nvd.nist.gov/vuln/detail/CVE-2024-24790
#364
alexglenn-ddl
opened
2 months ago
1
dcgm-exporter dont show metrics from other namespaces and pods k8s
#363
hive74
opened
2 months ago
11
dcgm-exporter log: No Kubelet socket, ignoring
#362
jeffreyyjp
closed
2 months ago
2
Protobuf handling is incorrect
#361
fbacchella
opened
2 months ago
2
dcgm-exporter crashes when run on Debian 12
#360
stevenmcastano
closed
2 months ago
1
Make nvidia resource names configurable
#359
lx1036
closed
2 months ago
1
README link about "To integrate DCGM-Exporter with Prometheus and Grafana, see the full instructions in the user guide." is already invalid
#358
jeffreyyjp
closed
2 months ago
1
Rename default PCIe metrics for better readability
#357
koshieguchi
closed
2 months ago
1
Seeking community feedback on potential new feature: Standardize labels for next major release
#356
glowkey
opened
2 months ago
6
[dashboard] Rework dashboard (MIG support, Grafana deprecations, Hostname)
#355
frittentheke
opened
2 months ago
0
Why `DCGM_FI_DEV_PCIE_{TX,RX}_THROUGHPUT` is default instead of `DCGM_FI_PROF_PCIE_{TX,RX}_BYTES `?
#354
koshieguchi
closed
2 months ago
2
Duplicated, missing or wrong metrics if using MIG, Grafana dashboard showing wrong duplicated / false values
#353
frittentheke
opened
2 months ago
2
cannot get DCGM_FI_PROF_SM_ACTIVE metrics
#352
qingfenghcy
opened
2 months ago
1
[Helm] Enable custom metrics, mount ConfigMap by default
#351
chipzoller
closed
1 month ago
32
[Helm] Enable ConfigMap mount by default
#350
chipzoller
closed
2 months ago
8
enable DCGM_EXPORTER_KUBERNETES and podrequestapi is avaiable but not found container and namespace label in Metrics
#349
Kevinz857
closed
3 months ago
3
GPU Failure Detection and Alerting Enhancement
#348
jz543fm
opened
3 months ago
14
Cannot Retrieve GPU PIDs from DCGM Metrics
#347
doronkg
closed
3 months ago
4
fix: correct metric help text
#346
pintohutch
closed
3 months ago
1
DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2
#345
xuchenCN
closed
3 months ago
3
Next