Closed simskij closed 2 years ago
After relating a charm to traefik, its metrics endpoint is not updated and prometheus reports health: down
because it is no longer reachable via the local ip.
Users of MetricsEndpointProvider must be instructed to always set custom refresh events
self.metrics = MetricsEndpointProvider(
# ...
refresh_event=[ # needed for ingress
self.ingress.on.ready_for_unit,
self.ingress.on.revoked_for_unit,
self.on.update_status,
]
MetricsEndpointProvider should always observe update-status by default.
MetricsEndpointProvider should update relation data every re-init.
I.e. the contructor MetricsEndpointProvider should call self._set_scrape_job_spec
every instantiation, instead of registring it as an observer.
Roll the responsibility to the user by introducing an update_endpoint
method like we do in PrometheusRemoteWriteProvider.
Ideas? @dstathis @Abuelodelanada @rbarry82 cc: @PietroPasotti
I think proposal #3 is preferable by far. It's idempotent, users don't have to do anything at all, it doesn't depend on update-status-interval
or calling other events, and it can easily be removed from the library constructor when stripPrefix
middleware lands in traefik, which makes this problem more or less disappear entirely (at least from an in-model/cluster perspective, as well as any external targets which have routable endpoints and don't need a path specified by any reverse proxy).
Tested manually and the combination of:
solves the issue.
With which charm did you experience this @simskij ? You may need to update charm code:
external_url
to MetricsEndpointProviderport = urlparse(self._external_url).port or 80
I saw it with the loki datasource in grafana after deploying it as a bundle.
I saw it with the loki datasource in grafana after deploying it as a bundle.
If it's a loki datasource issue then perhaps it's not related to prometheus_scrape?
Maybe we need to manually call update_source
in loki?
BTW, update_source
seems very different from refresh_event
.
@dstathis @rbarry82
Maybe we need to manually call
update_source
in loki? BTW,update_source
seems very different fromrefresh_event
. @dstathis @rbarry82
update_source
is just a superset of _set_unit_details
which also allows passing additional fields, and was added explicitly for consumers to say "I have an ingress now, so update out-of-band in case GrafanaSourceProvider._source_url
from the constructor is out of date".
Since Loki already uses the property in the constructor, update_source
would be called when an ingress is added, yes, which allows setting/updating the Grafana relation data immediately after ingress_ready
rather than waiting for some other event to re-trigger the constructor. We could do the same thing in grafana_source
as is done here, but it would make sense from Loki's codebase to add it just after update_endpoint(...)
, since the semantics are the same. The Prometheus libraries have just obsessively avoiding having any public API at all which could be used for this purpose.
My bad, I saw it in Prometheus too, but it seems to have been resolved now.
Bug Description
See title. If you first relate to Prometheus and then to Traefik, it all works as expected. The other way around, no cigar.
To Reproduce
-
Environment
-
Relevant log output
Additional context
We could have used the ingress established/revoked events, but these are unfortunately fired prematurely