issues
search
NVIDIA
/
dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923
stars
159
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Understand how exporter is able to query metrics
#421
Indresh2410
opened
4 hours ago
1
DCGM-Exporter release 3.3.9-3.6.1
#420
glowkey
closed
21 hours ago
0
replace deprecated method grpc.DialContext in favour of grpc.NewClient
#419
tariq1890
opened
1 day ago
0
DCGM_FI_DEV_GPU_UTIL abnormal point
#418
dafu-wu
opened
2 days ago
1
dcgm-exporter counter value goes down
#417
luccabb
opened
4 days ago
1
Not collecting GPU metrics; Error getting devices count: Cannot perform the requested operation because NVML doesn't exist on this system
#416
saichanumolu9
opened
5 days ago
0
Checksum mismatch for github.com/emicklei/go-restful/v3@v3.11.1
#415
WilliamVenner
opened
1 week ago
1
Compiled locally, server runs, fails
#414
basi-a
closed
2 weeks ago
0
fix: bump grpc dependency to 1.64.1
#413
pintohutch
closed
2 weeks ago
0
Segfaults with dcgm-exporter 3.3.0 and higher
#412
andrewjamesbrown
opened
2 weeks ago
4
Pod and Namespace Labels Missing in dcgm-exporter Metrics
#411
qimike
opened
2 weeks ago
2
Can dcgm-export be used with Apptainer instead of Docker?
#410
sorenwacker
closed
2 weeks ago
3
Segmentation fault when running with the default configuration for the GPU Operator on kind
#409
klueska
opened
3 weeks ago
2
failed to transform metrics for transform 'podMapper'; err: failure getting pod resources;
#408
jicki
opened
3 weeks ago
0
Why SYS_ADMIN is required?
#407
Yvanll
opened
3 weeks ago
1
Fix Helm Templates Generation
#406
Indresh2410
closed
3 weeks ago
4
Helm templates not getting populated when built from source
#405
Indresh2410
closed
3 weeks ago
0
Is RTX4090 supported?
#404
fzyzcjy
closed
3 weeks ago
2
Maintain uniformity with helm chart and static yaml's by adding securityContext
#403
Indresh2410
closed
3 weeks ago
1
Maintain uniformity with helm chart and static yaml's
#402
Indresh2410
closed
3 weeks ago
10
can exporter the uce error?
#401
zhucan
opened
4 weeks ago
1
Overhead of Enabling `DCGM_FI_PROF_SM_ACTIVE` and `DCGM_FI_PROF_SM_OCCUPANCY` Metrics
#400
hongpeng-guo
closed
1 month ago
2
I want to see how many GPU cores have been allocated to each container through metrics.
#399
changhyuni
opened
1 month ago
0
INFO[0000] Not collecting DCP metrics: This request is serviced by a module of DCGM that is not currently loaded
#398
fortminors
opened
1 month ago
5
can not collect gpu utilization metric when mig enable for some pods
#397
melikeiremguler
opened
1 month ago
1
doc: golang >= 1.23 is required
#396
stas00
closed
3 weeks ago
2
DCGM_FI_PROF_GR_ENGINE_ACTIVE not emitted on system with more than one GPU
#395
chipzoller
closed
1 month ago
2
Bug with DCGM_FI_DEV_VGPU_INSTANCE_IDS metric
#394
Deezzir
closed
1 month ago
7
dcgm-exporter daemonset Startup error Failed to pass the health check
#393
guoliangmiao
opened
1 month ago
2
In the case of gpu pass-through, does dcgm-exporter on the physical host support capturing gpu metrics of kvm virtual machines?
#392
lddlww
opened
1 month ago
1
Service monitor API value configurable
#391
dtzar
closed
4 weeks ago
0
DCGM-Exporter release 3.3.8-3.6.0
#390
glowkey
closed
2 months ago
0
Missing 3.3.8 builds
#389
xnox
closed
2 months ago
2
DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes
#388
valafon
closed
2 months ago
1
DCGM Exporter in EKS p4d.24xlarge instance type controller error
#387
camilopaezrios
opened
2 months ago
0
DCGM Exporter in EKS p4d.24xlarge instance type controller error
#386
camilopaezrios
opened
2 months ago
0
DCGM-exporter pods stuck in Running State, Not getting Ready without GPU allocation.
#385
rohitreddy1698
opened
2 months ago
12
Add a health status metric for every gpu card
#384
lx1036
opened
2 months ago
1
How does the DCGM exporter work with DCGM?
#383
changhyuni
closed
2 months ago
3
fix: edit gitignore and require dir & file
#382
kschoi93
closed
2 months ago
6
Error with "make binary" operation in local development
#381
kschoi93
opened
2 months ago
0
No DCGM_FI_DEV_FB_FREE reported for MIG-enabled GPUs
#380
george-kuanli-peng
opened
2 months ago
0
Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
#379
Vijaygawate
opened
2 months ago
2
failed to transform metrics for transform 'podMapper'
#378
jicki
opened
3 months ago
0
How does dcgm-exporter, when running on k8s as a daemonset, communicate with the host's dcgm host engine?
#377
yx-lamini
opened
3 months ago
0
Update contribution doc to require signing
#376
chipzoller
opened
3 months ago
0
Allow selecting the service's ClusterIP
#375
remram44
closed
2 weeks ago
6
Rename 'secuity' to 'security'
#374
remram44
closed
2 weeks ago
6
The pod and namespace information in the monitoring indicators of some Gpus occupied by Pods is empty
#373
qingfenghcy
opened
3 months ago
0
time="2024-08-08T03:09:05Z" level=error msg="Failed to write response." error="write tcp 10.202.3.1:9400->10.202.2.2:49674: i/o timeout
#372
safeAndSound3
opened
3 months ago
0
Next