issues
search
NVIDIA
/
gpu-monitoring-tools
Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.02k
stars
301
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
cannot work with gpushare-scheduler-extender
#65
lazywhite
closed
4 years ago
1
pod gpu metric support for k8s-1.13
#64
lazywhite
closed
4 years ago
2
container PID namespace isolation with NVML
#63
zw0610
opened
4 years ago
3
nvml.Device should support more method for detail request
#62
mozhata
opened
4 years ago
0
Ignore errors when kubelet is not responsive.
#61
acmore
closed
4 years ago
0
Support for Diagnostics
#60
xing0821
opened
4 years ago
1
Getting helm chart to work on kubernetes 1.17
#59
aruninnanje
closed
4 years ago
1
DCGM metrics not showing via curl xxxx:9100/metrics
#58
c0nsaw
closed
4 years ago
1
/usr/local/bin/dcgm-exporter: line 167: kill: (13453) - No such process
#57
aaktaev
closed
4 years ago
2
dgcm-exporter failed on GPU node
#56
aaktaev
closed
4 years ago
7
ld: unknown option: --unresolved-symbols=ignore-in-object-files
#55
juchaosong
closed
4 years ago
2
node-exporter
#54
damon008
closed
4 years ago
3
Replace shell command with nvml library to populate GPU metrics
#53
Jeffwan
closed
4 years ago
1
Decouple gpu-monitoring-tools with node exporter
#52
Jeffwan
closed
4 years ago
1
add hostname to dcgm_* metrics
#51
sergeimonakhov
closed
4 years ago
2
Add gpu_type and device count to metrics
#50
Jeffwan
closed
4 years ago
4
--display option bug while running docker image
#49
JuHyung-Son
closed
4 years ago
1
Compatibility with K8N 1.16
#48
steffenteichmannhska
closed
4 years ago
1
Fixed dcgm-exporter to suppress nv-hostengine start with nvswitch
#47
ChriJonesNV
closed
4 years ago
1
dcgm[-exporter] should detect crashed/hung GPU's / not be dependent CLI tools
#46
maxx
opened
5 years ago
2
GPU metrics Dashboard outside cluster access
#45
deepakkuk
closed
5 years ago
2
GPU metrics node exporter doesn't work in EKS
#44
chengnignzhang
closed
4 years ago
3
Change dcgm-exporter to expose metrics through prometheus web server
#43
Jeffwan
closed
4 years ago
2
Exporter output appears out of order
#42
thim22
closed
4 years ago
1
node-exporter OOMKilled
#41
deng1028
closed
4 years ago
2
Fat finger error, please delete.
#40
bashimao
closed
5 years ago
0
Improve the bare-metal install experience of prometheus-dcgm
#39
kriszentner
closed
4 years ago
1
The DCGM go bindings library build error
#38
CloudWarGit
closed
5 years ago
1
what is the meaning of "dcgm_pcie_rx_throughput"?
#37
zzpp3377
closed
4 years ago
1
[Pod GPU Metrics Exporter] Fix GPU metrics watcher stuck
#36
takmatsu
closed
4 years ago
5
Memory.ECCErrors is null in nvml binding
#35
Ehekatl
closed
4 years ago
1
GPU isolation not working after setting default runtime to nvidia
#34
carlosleocadio
closed
4 years ago
1
Several bugs for node-exporter/pod-gpu-node-exporter-daemonset.yaml
#33
Cherishty
closed
4 years ago
6
dcgm processInfo return "No data is available"
#32
chenk008
closed
5 years ago
7
The nvml README "processInfo" show pid with "sm",the sample code not show
#31
chenk008
closed
5 years ago
1
added gpu_type and device count
#30
ceizner
closed
4 years ago
3
Help pulling field #4
#29
markjacksonfishing
closed
5 years ago
4
K8S incompatibility- Hostname in metrics
#28
shmulikah
closed
5 years ago
2
nvidia-smi not found
#27
papagalu
closed
5 years ago
7
Add NVML binding for nvmlSystemGetCudaDriverVersion
#26
jjacobelli
closed
5 years ago
0
exporters/prometheus-TRTIS
#25
vilmara
closed
5 years ago
4
nvlink bandwidth metrics
#24
hholst80
closed
5 years ago
1
Have trouble in making docker image of pod-gpu-metrics-exporter
#23
JunFugithub
closed
5 years ago
1
Add support for getting NVLink information from a pair of GPUs
#22
klueska
closed
5 years ago
1
Add deviceGetBrand binding in nvml
#21
jjacobelli
closed
5 years ago
1
does pod-devices-exporter work in kubernetes version=1.10 ?
#20
tingzhang-ming
closed
5 years ago
7
[bug] GPU display mode wrong
#19
cavanwang
closed
5 years ago
1
Add device compute capability info
#18
anight
closed
5 years ago
2
nvml sample cannot work
#17
cavanwang
closed
5 years ago
3
fix issue #15
#16
qieqieplus
closed
4 years ago
3
Previous
Next