cloudoperators / greenhouse

Cloud operations platform
Apache License 2.0
10 stars 1 forks source link

[EPIC] - Greenhouse Monitoring & Alerting #104

Open IvoGoman opened 3 months ago

IvoGoman commented 3 months ago

Priority

High

Description

Greenhouse controllers emit the common set of metrics exposed with controller-runtime. Additional instrumentation should give more insights into specific error conditions of a controller's reconciliation. Furthermore, there are more components of Greenhouse which need to be actively monitoring such as cors-proxy, id-proxy, and service-proxy.

Metrics should be visualised with Plutono Dashboards so that the overall platform, as well individual components health, is easily consumable.

Metrics should be used to define PrometheusAlertRules so that failure conditions can be identified and proactively resolved.

Acceptance criteria:

Reference Issues

No response

auhlig commented 1 month ago

Yes, please! We should invest at least in the described basic monitoring before onboarding more customers

auhlig commented 1 month ago