SovereignCloudStack / k8s-observability

Deployment manifests and knowledge base for the KaaS observability solution
https://scs.community/
Apache License 2.0
1 stars 0 forks source link

Unstable behavior in the monitoring.scs.community instance #74

Closed matofeder closed 1 month ago

matofeder commented 2 months ago

Fix the SCS monitoring instance to address the following issues:

matofeder commented 1 month ago

Grafana UI related issues disappeared after PS upgraded the underlying IaaS infrastructure to version osism-8.0.0-rc.2. As we assumed the issues were not caused by the deployment of monitoring or monitoring settings, but by the issues with underlying infrastructure (probably LBs)

matofeder commented 1 month ago

Three PODs have issue with volume attachment.

CInder and OpenStack wrongly reports that volumes are already attached to instances: 2f471b65-53bb-49f1-b20e-b2c05555b310", but they are not.

matofeder commented 1 month ago

Volumes for affected PODs have been removed and monitoring has been redeployed. PODs loki-backend and loki-writeloki-writeloki-write are deployed in HA hence loss of one volume should not affect the proper function. Volume of thanos-compactor is also save to remove.

After the actions above the SCS monitoring is healthly again.