SovereignCloudStack / issues

This repository is used for issues that are cross-repository or not bound to a specific repository.
https://github.com/orgs/SovereignCloudStack/projects/6
2 stars 1 forks source link

Monitoring for zuul.scs.community #398

Closed fkr closed 2 months ago

fkr commented 1 year ago

As a Member of the SCS community, I'd like to make sure that the zuul is properly monitored.

Definition of Ready:

Definition of Done:

o-otte commented 7 months ago

Node-exporter, statsd exporter (Zuul still needs to be configured to send statsd metrics to exporter), and cadvisor are installed on the Zuul Node. @matofeder and I will include the scraped metrics to Observer Cluster

matofeder commented 7 months ago

The cadvisor and node-exporter services have been re-deployed, utilizing the patched version of the roles found in the zuul repository.

Both of these exporters have been registered in the SCS observer cluster. Find the Observer cluster main dashboard and click on the Zuul host monitoring panel.

The StatsD Prometheus exporter has been decommissioned from the Zuul VM. The reason is that I did not find any Grafana dashboards available for Zuul with a Prometheus datasource. Therefore, I am investigating the idea of incorporating a graphite datasource into the monitoring stack (Observer). Then, we can directly consume statsd Zuul metrics. The main benefit here is that we could also reuse the nice Zuul dashboards available for the Graphite datasource, for example, at https://grafana.opendev.org/d/21a6e53ea4/zuul-status

matofeder commented 7 months ago

As a part of the monitoring work the following upstream contribution https://github.com/osism/ansible-collection-services/pull/1276 should ensure that Zuul Scheduler and Zuul Nodepool builder/launcher can be configured to emit metrics to the statsd receiver (monitoring.scs.community).

matofeder commented 7 months ago

This upstream contribution, https://github.com/osism/ansible-collection-services/pull/1277, allows configuring Zuul's Zookeeper instance to expose Prometheus metrics.

matofeder commented 2 months ago