apache / hertzbeat

Apache HertzBeat(incubating) is a real-time monitoring system with agentless, performance cluster, prometheus-compatible, custom monitoring and status page building capabilities.
https://hertzbeat.apache.org/
Apache License 2.0
5.42k stars 943 forks source link

[Task] <support monitoring NVIDIA gpu> #2263

Open tomsun28 opened 1 month ago

tomsun28 commented 1 month ago

Description

support monitoring NVIDIA gpu for ai

maybe we can use the nvidia dcgm and nvml.

Task List

No response

zhangshenghang commented 1 month ago

@tomsun28 Please assign it to me, I will complete it