SovereignCloudStack / k8s-observability

Deployment manifests and knowledge base for the KaaS observability solution
https://scs.community/
Apache License 2.0
1 stars 0 forks source link

Add support for CEPH monitoring #55

Closed matofeder closed 3 months ago

matofeder commented 3 months ago

As a CSP I want to observe my CEPH-based storage and gather relevant metrics from it. I want to use an existing Prometheus-based observability solution as storage for CEPH metrics and then I want to see relevant dashboards in the Grafana.

relates to: https://github.com/SovereignCloudStack/issues/issues/576

matofeder commented 3 months ago

OSISM supports two ways of CEPH deployment:

We agreed (discussion from the IaaS meeting https://input.scs.community/2023-scs-team-iaas#Greenfield-rook-deployments-yeoldegrove) that we should be able to monitor CEPH cluster deployed by both tools. Overall it is still the same endpoint that exposes the same metrics, but we should be able to configure Prometheus to scrape "internal" k8s service endpoints as well as "external" (non-k8s) host endpoints.

matofeder commented 3 months ago

OSISM Tesbed installation has been deployed for test purposes with the following parts:

matofeder commented 3 months ago

As we agreed to support two methods of CEPH deployment (ceph-ansible and Rook), we also need to address Prometheus alerts.

Prometheus alerts from the https://github.com/ceph/ceph-mixins project seem to be up-to-date and suitable for ceph-ansible deployment.

Rook already includes Prometheus alerts ported from the ceph-mixins https://github.com/rook/rook/tree/master/deploy/charts/rook-ceph-cluster/prometheus, but these are out-of-sync and there are some differences related to Rook's Kubernetes-based deployment. E.g. rook is not backed by ce[hadm and does not support features like the nvmeof daemon.

Rook alerts update has been done upstream: https://github.com/rook/rook/pull/14312

matofeder commented 3 months ago

Related fix of rook ceph upstram project https://github.com/rook/rook/pull/14313