Closed slopezz closed 11 months ago
LGTM label has been added.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: slopezz
The full list of commands accepted by this bot can be found here.
The pull request process is described here
LGTM label has been added.
With recent migrations we saw that failover alert do not work on the first failover upon saas-operator pod creation.
The reason is, there is a timeseries database for every
redis_server
, on latest migrations the failover orcurs on a new redis server instance, passing the counter fromnon-exist
to1
, so prometheusrate
does not get it.In the next image, filtering per shard and sentinel, there are 3 timeseriesdb with
0
value (the ones from old redis_servers), and one timeseriesdb with value1
(new redis_server).This PR removes the
redis_server
label from switchMasterCount metric, the same already done at failoverAbortNoGoodSlaveCount, which is the same case, we want a metric per shard only./kind bug /kind release /priority important-soon /assign