apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.97k stars 1.09k forks source link

Export Virtual Router Health Checks to Prometheus #6212

Open joschi36 opened 2 years ago

joschi36 commented 2 years ago
ISSUE TYPE
COMPONENT NAME

API, VR

CLOUDSTACK VERSION
4.16.0.0

But after my research, not available in current versions.

CONFIGURATION

N/A

OS / ENVIRONMENT

N/A

SUMMARY

For monitoring of Virtual Routers, there seems to be no way of having the health check results in Prometheus metrics endpoint.

Currently, we are querying health check results with a pull based method for every Virtual Router, that is really inefficient on a large scale of routers.

STEPS TO REPRODUCE
EXPECTED RESULTS

Have VR health check metrics exported at the /metrics endpoint.

I thought of something like this, according to the current output of getRouterHealthCheckResults:

# TYPE cloudstack_virtualrouter_healthcheck_result stateset
# HELP cloudstack_virtualrouter_healthcheck_result Virtual Router Health Check result
cloudstack_virtualrouter_healthcheck_result{checkname="connectivity.test",zone="zone1",virtualrouter="i-123456-VM",network="test-network"} 1

# TYPE cloudstack_virtualrouter_healthcheck_lastcheck gauge
# HELP cloudstack_virtualrouter_healthcheck_lastcheck Virtual Router Health Check lastcheck timestamp
# UNIT cloudstack_virtualrouter_healthcheck_lastcheck seconds
cloudstack_virtualrouter_healthcheck_lastcheck{checkname="connectivity.test",zone="zone1",virtualrouter="i-123456-VM",network="test-network"} 1649228157

Maybe this is not 100% accurate to OpenMetrics, but I tried to follow the guidelines. Maybe somebody with more knowledge can have a look.

A drawback with this solution would be that we lose information of the check result message, as Prometheus is not designed to handle this.

ACTUAL RESULTS

Virtual Router Health Checks not present on the CloudStack Prometheus metrics endpoint

rohityadavcloud commented 2 years ago

Possible to add new metrics https://github.com/apache/cloudstack/pull/4438

DaanHoogland commented 1 year ago

@NuxRo can you have a look at this? It seems to me the healthcheck output is a bit much for a prometheus interface. But if we feel it is reasonable ...