Closed balbirthomas closed 2 years ago
Reviewing the code and commit logs now I see
def _on_metrics_provider_relation_broken(self, event):
"""Block the charm when no charms contribute scrape jobs."""
if not self.unit.is_leader():
self.unit.status = WaitingStatus("inactive unit")
return
and
def _update_new_metrics_consumer(self, event):
"""Set Prometheus scrape configuration for all targets."""
logger.debug("Updating new metrics consumer")
if not self.unit.is_leader():
self.unit.status = WaitingStatus("inactive unit")
return
and so on. So it seems in the past a decision was made that if a unit is not a learder unit then it will do nothing and later on this behavior was extended to ensure non leader units set WaitingStatus
with message "inactive unit". This does make sense since Prometheus scrape config is meant to be an intermediary charm and the value of scaling it can be questionable.
In view of the above
What side effects will produce that the Metrics consumer will receive multiple identical Jobs??
What side effects will produce that the Metrics consumer will receive multiple identical Jobs??
There is de-duplication of scrape jobs (at least names) that @dstathis implemented for the Metrics consumer (Prometheus charm). Also as it stands if the leader guards shown above are removed non-leader units will produce an error when they try to set application relation data (the scrape jobs).
However all these issues can be worked around if there is a desire to do so. The key question is do we want to do so, and what do we gain by doing so.
A branch with integration test asserting that non leader units set waiting status has been created here. This branch depends on addition of an alert rule to Zinc k8s charm for which another PR has been created here.
Issue