issues
search
NVIDIA
/
gpu-monitoring-tools
Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.02k
stars
301
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
nvidia-smi to report PCIe utilization %
#215
amrragab8080
opened
3 years ago
0
How to get pod level GPU metrics
#214
faheemsohail
opened
3 years ago
0
fix typo in README.md
#213
caozhuozi
closed
3 years ago
1
dcgm-exporter can't run
#212
JohanOu
opened
3 years ago
2
dcgm exporter doesn't monitor mig disabled gpus with mixed strategy
#211
chloejiwon
opened
3 years ago
0
Bump k8s.io/kubernetes from 1.18.2 to 1.18.19 in /pkg
#210
dependabot[bot]
closed
3 years ago
2
dcgm-exporter doesn't see GPU processes and GPU memory usage
#209
lev-stas
opened
3 years ago
0
dcgm-exporter running on "g4dn.metal" in AWS EKS fails with "fatal: morestack on gsignal"
#208
SQUIDwarrior
opened
3 years ago
1
does this repository support the windows nvidia gpu?
#207
flyysr
opened
3 years ago
0
How can the k8s cluster monitor the GPU model name?
#206
jgsetogetifshjcdgjvxdgh
opened
3 years ago
0
make binary cannot find package "github.com/urfave/cli/v2"
#205
tekenny
opened
3 years ago
0
Update helm charts for 2.4.0
#204
glowkey
closed
3 years ago
1
Nvidia master rebase
#203
omer-dayan
closed
3 years ago
1
Failed to make binary
#202
sunhmy
opened
3 years ago
3
failed to make binary
#201
sunhmy
closed
3 years ago
0
GPU Utilization metric (DCGM_FI_DEV_GPU_UTIL) disabled by default
#200
Minkyu-Choi
closed
3 years ago
2
tags is 2.4.0 not v2.4.0 cause can't go get model
#199
qinzhenyi1314
opened
3 years ago
0
fix IP address in sample
#198
wweir
closed
3 years ago
1
Fixed grouping of prometheus metrics
#197
MarcusWichelmann
closed
3 years ago
2
How to monitor occupancy per SM.
#196
malixian
opened
3 years ago
5
Upate the helm-chart to version 2.4.0-rc.3
#195
dbeer
closed
3 years ago
1
Log spam in nv-hostengine.log due to ReadNvSwitchStatusAllSwitches() returned No data is available
#194
jfolz
opened
3 years ago
0
GPU_I_PROFILE="<<<NULL>>>"
#193
munir-georges
opened
3 years ago
7
Fix to "No labels with GPU-Card Name in dcgm-exporter #91"
#192
julian3xl
closed
3 years ago
1
Installing the dgcm-exporter with Helm3 on OpenShift faces permissions issues
#191
vemonet
opened
3 years ago
0
Added priorityClassName support in helm chart.
#190
christianbeland
closed
3 years ago
1
exporter returns no profiling metrics after some period of time
#189
shovsj
opened
3 years ago
1
dcgm-exporter reports stale metrics if nvhost-engine is restarted
#188
bchess
opened
3 years ago
0
invalid metrics in 2.4.0rc2
#187
juliantaylor
opened
3 years ago
1
Added probes configuration to helm chart values.yaml
#186
christianbeland
closed
3 years ago
2
Ensure go test does not trigger vet warnings.
#185
vatine
closed
3 years ago
1
dcgm-exporter crashes while getting device cpu affinity
#184
eugenberend
closed
3 years ago
1
Fix GPUDevice to Model
#183
Jivvon
closed
3 years ago
1
nvidia-dcgm-exporter creates huge logs inside container
#182
boniek83
opened
3 years ago
6
How to monitor multiple GPU servers
#181
anilnokia
opened
3 years ago
1
DCGM exporter crashes when installed by helm3
#180
jiangxiaosheng
opened
3 years ago
5
GKE: access DCGM metrics from HPA
#179
JulesBelveze
closed
3 years ago
3
Create the helm chart for 2.4.0-rc2
#178
dbeer
closed
3 years ago
1
Update the helm chart for 2.4.0
#177
dbeer
closed
3 years ago
1
dcgm-exporter cannnot installed successfully on 2080Ti
#176
ReyRen
opened
3 years ago
8
#169 Support extra config map volumes on dcgm-exporter helm-chart
#175
sahare92
closed
3 years ago
3
Why duplicate metrics occured when a job scheduling to this server
#174
WYmindsky
opened
3 years ago
3
dcgm-exporter gpu_collector_test.go's NewDCGMCollector broken
#173
shatil
opened
3 years ago
0
dcgm-exporter unit test depends on k8s.io/kubernetes directly as a library
#172
shatil
opened
3 years ago
0
Pod run as non root user
#171
anaconda2196
closed
3 years ago
0
need `dcgmGetValuesSince` function
#170
qisikai
opened
3 years ago
1
How do I mount a custom csv file with Kubernetes ?
#169
dmrub
opened
3 years ago
1
Error retrieving DCGM MIG hierarchy: API version mismatch
#168
PatHoo
opened
3 years ago
1
Install broken on AKS
#167
RaananHadar
opened
3 years ago
4
dcgm-exporter missing metrics for A100 GPU
#166
anaconda2196
opened
3 years ago
5
Next