issues
search
NVIDIA
/
gpu-monitoring-tools
Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.01k
stars
301
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
what is the problem of API version mismatch
#165
kentinchen
closed
3 years ago
3
Helm chart pointing to 2.3.1 container, which is not available on nvcr.io
#164
francoishernandez
closed
3 years ago
4
GPU with MIG instances
#163
crinavar
opened
3 years ago
2
Helm chart for v2.3.1
#162
glowkey
closed
3 years ago
1
dcgm-exporter pod is crashingoff
#161
anaconda2196
opened
3 years ago
29
exported_pod field not available on GKE cluster (but available on AWS)
#160
RobertLucian
opened
3 years ago
2
update helm charts for dcgm-exporter 2.3.0
#159
glowkey
closed
3 years ago
1
Error: failed to download "gpu-helm-charts/dcgm-exporter" (hint: running `helm repo update` may help)
#158
zkf85
opened
3 years ago
1
Helm chart for v2.2.0
#157
shivamerla
closed
3 years ago
1
Add helm chart for dcgm-exporter 2.1.2
#156
klueska
closed
3 years ago
1
make for specific DCGM version?
#155
biocyberman
opened
3 years ago
0
Base image fixes
#154
AssafKatz3
closed
3 years ago
1
Need help, trapped in the downloading DCGM
#153
yangfly
opened
3 years ago
1
DCGM_FI_DEV_GPU_UTIL Abnormal Output
#152
Jea-Eok-Kim
opened
3 years ago
0
Method of calculating GPU utilization when applying NVIDIA Multi-Instance GPU
#151
Jea-Eok-Kim
opened
3 years ago
13
Questions about EventType, EventData, and Xid
#150
ruiwen-zhao
closed
3 years ago
4
Inquiries on the collection of GPU resources using DCGM
#149
Jea-Eok-Kim
opened
3 years ago
1
Ubi
#148
yarongol
closed
3 years ago
1
ARM64 support
#147
danmx
opened
3 years ago
2
too many warnings and errors
#146
jelmd
opened
3 years ago
14
error running samples, could not determine kind of name for C.xxx
#145
xial-thu
closed
3 years ago
1
Prometheus Exporters Hub by this repository! Thanks! :)
#144
ralfyang
opened
3 years ago
0
dcgm-exporter missing many metrics after upgrade
#143
huww98
opened
3 years ago
9
dcgm-exporter crashes after MIG reconfiguration
#142
kpouget
opened
3 years ago
1
Erro start dcgm-exporter pod - module of DCGM that is not currently loaded
#141
josericardomcastro
opened
3 years ago
1
dcgm-exporter: DCP metrics not enabled
#140
jelmd
opened
3 years ago
2
nvmlShutdown dlcloses all handles every time
#139
robertdavidsmith
opened
3 years ago
0
Container, namespace and pod informations on metrics
#138
josericardomcastro
opened
3 years ago
9
Allow helm chart to customize kubelet path
#137
vdebergue
opened
3 years ago
0
fix API version mismatch when checking GPU health
#136
noliaoliao
closed
2 years ago
1
Error checking GPU health: API version mismatch
#135
noliaoliao
opened
3 years ago
0
dcgm-exporter doesn't start on Docker
#134
gurapomu
opened
3 years ago
3
dcgm-exporter high cpu usage
#133
ysshaoxiao
opened
3 years ago
6
Failed to install gpu-helm-charts/dcgm-exporter
#132
jasperzhong
closed
3 years ago
1
Consider adding a 'pod' labels,which aggregate data at prometheus?
#131
qingwei8
opened
3 years ago
0
Add helm chart for dcgm-exporter 2.1.1
#130
shivamerla
closed
3 years ago
1
K8s Pod/namespace information in exported fields
#129
geoberle
closed
3 years ago
4
FR: Expose option to use JSON logs
#128
etherandrius
opened
3 years ago
0
whether A100 mig is supported
#127
zhcf
opened
3 years ago
1
Exposed metrics don't follow Prometheus spec
#126
etherandrius
opened
3 years ago
0
Sudden error message about /run/prometheus/dcgm.prom
#125
kubernetian
opened
3 years ago
0
Fix spacing in each metrics line
#124
srikiz
closed
2 years ago
1
why require k8s.io/kubernetes project directly, is not recommended
#123
utobe67
opened
3 years ago
12
Custom metrics issue
#122
PaulYuanJ
opened
3 years ago
4
Change kubeVersion constrain to support eks/gke
#121
decayofmind
closed
3 years ago
1
Pods with dcgm-exporter fail to start
#120
timClicks
closed
3 years ago
12
Error watching fields: Profiling is not supported for this group of GPUs or GPU
#119
motionlife
opened
3 years ago
10
Get GPU busid failed by changing default metric csv
#118
pokerfaceSad
closed
3 years ago
1
Add Helm chart for dcgm-exporter 2.1.0
#117
dualvtable
closed
3 years ago
1
Error getting process info: Setting not configured
#116
CermakM
opened
3 years ago
0
Previous
Next