Closed sed-i closed 1 year ago
We should probably examine this more deeply to find out why it is not consistent rather than just adding more events as a sledgehammer
We should probably examine this more deeply to find out why it is not consistent rather than just adding more events as a sledgehammer
I haven't convinced myself yet but I assume the reason was:
The sledgehammer events are now "relation-changed" and "update-status". And now that I think of it, they shouldn't be needed for catalogue: even if the url in reldata is not yet reachable, it doesn't matter for catalogue! Removed in next commit.
I haven't convinced myself yet but I assume the reason was:
- "relation-changed" was processed by the charm before it was processed by the lib.
- Before this change, the charm wasn't passing "ingress ready" as a refresh event.
The sledgehammer events are now "relation-changed" and "update-status". And now that I think of it, they shouldn't be needed for catalogue: even if the url in reldata is not yet reachable, it doesn't matter for catalogue! Removed in next commit.
I think the question, from my POV, is basically: "is the URL reachable at any point?"
If it eventually becomes reachable, then when? And we're back to the Traefik "watch for accessibility before sending events" topic from a month and a half ago.
My concern is that it's the behavior we've sporadically observed where there is a "valid" URL but it 404s, and cannot be reached at all (through the catalogue or otherwise) until the relation between ingress and the charm is broken and re-established. If we're seeing that behavior, we should really root cause it. If it's just unreachable for a little while, then yeah, let's leave it and pursue a solution on the ingress side.
My concern is that it's the behavior we've sporadically observed where there is a "valid" URL but it 404s, and cannot be reached at all (through the catalogue or otherwise) until the relation between ingress and the charm is broken and re-established.
I always encountered the same issue here: the advertized url is the fqdn, i.e. the url from before ingress. And re-relating fixed it.
Tested on 4cpu-8gb (load test) and 2cpu-7gb (gh runner) VMs and seems to be working now.
./render_bundle.py local.yaml \
--prometheus ../prometheus-k8s-operator/prometheus-k8s_ubuntu-20.04-amd64.charm \
--grafana ../grafana-k8s-operator/grafana-k8s_ubuntu-20.04-amd64.charm
My concern is that it's the behavior we've sporadically observed where there is a "valid" URL but it 404s, and cannot be reached at all (through the catalogue or otherwise) until the relation between ingress and the charm is broken and re-established.
This particular issue I see with grafana every single time I deploy the load test.
This particular issue I see with grafana every single time I deploy the load test.
This does not seem to be reproducible locally, and nobody from QA has reported this, so I wonder if it's something with the loadtest configuration.
Can't find anything wrong with the load-test config.
There too re-relating grafana:ingress traefik
fixes the 404.
Ok. Do we have some time to live debug this later this afternoon or tomorrow? This is the same behavior I was seeing with Prometheus in CZ, and I can't reproduce it anymore, but it's almost certainly Traefik. There is no particular reason that re-relating Grafana would change anything, and the relation data should be identical in both cases.
Issue
404 was confirmed working locally, but only once... In a load test deployment, the link was still wrong even after that fix.
Solution
Add more refresh events.
Context
Addresses https://github.com/canonical/catalogue-k8s-operator/issues/3.
Testing Instructions
Release Notes
Add more refresh events for catalogue.