issues
search
NVIDIA
/
gpu-monitoring-tools
Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.02k
stars
301
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Examples not available in the latest version of DCGM
#115
Tabrizian
opened
4 years ago
0
DCGM Python3 bindings
#114
Tabrizian
opened
4 years ago
0
Add chart for dcgm-exporter
#113
dualvtable
closed
4 years ago
1
Unable to start according to the instructions
#112
zkbutt
opened
4 years ago
1
dcgm-exporter POD CrashLoopBackOff or Error
#111
Leteong
opened
4 years ago
0
Consider adding a 'nodename' label
#110
mjpieters
closed
4 years ago
1
Grafana: GPU power total gauge, sum not useful
#109
mjpieters
opened
4 years ago
5
Make proper use the gpu variable
#108
mjpieters
closed
4 years ago
2
nvml.h | Request for Fan Speed RPM (not percent) | NV_CTRL_THERMAL_COOLER_SPEED
#107
berglh
opened
4 years ago
0
how prometheus get dcgm-exporter metrics?
#106
Leteong
closed
3 years ago
12
dcgm-exporter POD CrashLoopBackOff or Error
#105
Leteong
closed
4 years ago
0
all my data is '0'
#104
darkamumu
opened
4 years ago
0
Bare Metal | /run/prometheus/dcgm.prom Not Present
#103
atulyadavtech
opened
4 years ago
0
dcgm-exporter `Error getting device information: API version mismatch`
#102
notjames
closed
4 years ago
2
Broken link in README
#101
kshcherban
opened
4 years ago
2
Update README.md
#100
chychen
closed
3 years ago
1
Add label for kubernetes node to enable data aggregation
#99
MartinForReal
closed
4 years ago
0
Make the helm chart available via hub.helm.sh
#98
patrungel
opened
4 years ago
1
Wrong value in DCGM_FI_DEV_MEMORY_TEMP for NVIDIA Tesla-T4
#97
maxbischoff
opened
4 years ago
1
dcgm-exporter falied so start on GKE cluster (v1.16.11-gke.5)
#96
Dimss
opened
4 years ago
12
Fix DCGM_EXPORTER_LISTEN value of dcmg-exporter manifest file #94
#95
nakkoh
closed
4 years ago
2
dcgm-exporter POD cannot be running
#94
nakkoh
opened
4 years ago
0
Fixed an issue on dcgm-exporter daemonset templates
#93
jimoosciuc
closed
3 years ago
2
Filter out metrics with no value
#92
treydock
closed
4 years ago
2
No labels with GPU-Card Name in dcgm-exporter
#91
vizdrag
opened
4 years ago
1
Failed to initialize NVML
#90
guleng
closed
4 years ago
1
feat(master): fix error of dcgm test suit
#89
cuisongliu
closed
4 years ago
2
method is not support test suit
#88
cuisongliu
closed
4 years ago
0
Fix DCGM exporter daemonset env variable
#87
cmurphy
closed
4 years ago
2
Address flag
#86
colm-anseo
closed
4 years ago
3
Option to pass hostname/ip along with port
#85
bbelgodere
closed
4 years ago
3
Fan status and card count requirements
#84
guleng
closed
4 years ago
2
remove docker user dcgm-exporter
#83
Komey
closed
4 years ago
3
[feat] Several useful values added
#82
maxkochubey
closed
4 years ago
1
unknown flag
#81
guleng
closed
4 years ago
3
TLS Support
#80
RenaudWasTaken
opened
4 years ago
1
added affinity field in daemonset
#79
preved911
closed
4 years ago
3
dcgm exporter produces 404 page not found
#78
bbelgodere
closed
4 years ago
7
Whats the difference between nvmlDeviceResetApplicationsClocks and nvmlDeviceResetGpuLockedClocks?
#77
sirexeclp
opened
4 years ago
0
the label value dosen't have ',' so promethus will ootput 'text format parsing error in line 71: unexpected end of label value %!q(*string=0xc555d24350)'
#76
24sama
closed
4 years ago
1
The DCGM make binary failed
#75
eilinge
closed
4 years ago
2
Pod GPU Metric Docs
#74
JadCham
closed
4 years ago
2
Update helm chart
#73
zamog
closed
4 years ago
2
Fix health prob
#72
zamog
closed
4 years ago
1
Broken Link to Prom Exporter
#71
ajcollett
closed
4 years ago
3
rm
#70
iminders
closed
4 years ago
0
fix cgo issue
#69
iminders
closed
4 years ago
2
error
#68
zan666
closed
4 years ago
1
Dashboard for DCGM metrics
#67
muzammil360
closed
4 years ago
4
add gpu health metric
#66
yttan
closed
4 years ago
1
Previous
Next