apache / hertzbeat

Apache HertzBeat(incubating) is a real-time monitoring system with agentless, performance cluster, prometheus-compatible, custom monitoring and status page building capabilities.
https://hertzbeat.apache.org/
Apache License 2.0
5.77k stars 1k forks source link

[Task] <support monitoring NVIDIA gpu> #2263

Open tomsun28 opened 4 months ago

tomsun28 commented 4 months ago

Description

support monitoring NVIDIA gpu for ai

maybe we can use the nvidia dcgm and nvml.

Task List

No response

zhangshenghang commented 4 months ago

@tomsun28 Please assign it to me, I will complete it