3scale-ops / saas-operator

3scale SaaS Operator - www.3scale.net
Apache License 2.0
8 stars 2 forks source link

Fix/Sentinel switch-master-count metric #280

Closed slopezz closed 11 months ago

slopezz commented 11 months ago

With recent migrations we saw that failover alert do not work on the first failover upon saas-operator pod creation.

The reason is, there is a timeseries database for every redis_server, on latest migrations the failover orcurs on a new redis server instance, passing the counter from non-exist to 1, so prometheus rate does not get it.

In the next image, filtering per shard and sentinel, there are 3 timeseriesdb with 0 value (the ones from old redis_servers), and one timeseriesdb with value 1 (new redis_server). image

This PR removes the redis_server label from switchMasterCount metric, the same already done at failoverAbortNoGoodSlaveCount, which is the same case, we want a metric per shard only.

/kind bug /kind release /priority important-soon /assign

3scale-robot commented 11 months ago

LGTM label has been added.

Git tree hash: bca0e243bd944861c0694fb6242b87a4c09a0a23

raelga commented 11 months ago

/lgtm

slopezz commented 11 months ago

/approve

3scale-robot commented 11 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slopezz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/3scale-ops/saas-operator/blob/main/OWNERS)~~ [slopezz] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
3scale-robot commented 11 months ago

LGTM label has been added.

Git tree hash: 36d19eb2ca8b9238ac00a9ea8a55a0c162539aac