dmwm / CMSRucio

7 stars 31 forks source link

Meta: Export cluster resource usage and health metrics #742

Open dynamic-entropy opened 6 months ago

dynamic-entropy commented 6 months ago

Enhancement Description

Track resource usage metrics such as CPU and memory

Export resource and health metrics from our cluster

In monit

Use Case

Often, errors and misbehaving processes go unnoticed until they have caused inconvenience and are manually reported by someone. This is slow and the delay sometimes causes us to have a recovery period for the system; which is not desired.

Possible Solution

No response

Related Issues

No response

ericvaandering commented 5 months ago

Look at what we already have from kube-eagle

ericvaandering commented 5 months ago

@Panos512 will talk with IT and/or @dynamic-entropy will talk with @arooshap and @vkuznet to see what the generic/supported way of doing this will be