Open PietroPasotti opened 2 years ago
I haven't actually tried reproducing these steps, but my spidey sense would be that the first application hasn't finished removing by the time the second is related
I haven't actually tried reproducing these steps, but my spidey sense would be that the first application hasn't finished removing by the time the second is related
Mmmh, that would be possible. However, as far as I can tell, the relation was gone (juju status does not show it, and the whole application was also "gone" in the same sense). I do realize that juju might think otherwise though... My common sense would suggest that if accessing the data gives an error (the data is gone), then so should the relation. That would be an inconsistency in juju, that should unlist the relation before it gets rid of the data (or 'simultaneously')
I agree that it could be a Juju bug. Last time we encountered something similar (applications which were "stuck" and could not be removed in some scenarios), I went through a lot of trace logging in Juju, dumped the database in a bad state, etc. Ultimately, it's Mongo, and there aren't any cascading deletes or strict referential integrity. Juju refcounts inside Mongo documents to know when it's safe to remove an object, and it's spread across a couple of documents.
The trace logging in Juju is.. a lot, and I'm not a Juju developer, so determining exactly which loggers I needed to enable was a little bit trial and error, and it's been a couple of months. That particular exception says to me "that relation data still exists, but your application isn't marked as part of that relation, so go away". Either because of the async nature of the way things are handled (Juju uses its own transaction queue for Mongo to provide assurances around data integrity, so "remove this relation data as part of cleanup" may have been queued up as part of an operation where the tombstone was set to dying
but we can't get rid of the relation itself until we send these events), especially given that relation-broken
doesn't provide any contract around whether data should exist that I can remember. The wording of "as if this relation never existed" implies no.
Either way, it would not be the first "ghost/zombie" we've seen in Juju if something went wrong there.
Bug Description
ModelErrors are fired when prometheus tries to access
self.ingress.relation.data
, becauseself.relation
simply doesself.relations[0]
and, apparently, relations contains not one but two relations, the first of which is a ghost (possible juju bug?)The second relation is the one we want.
To Reproduce
juju deploy prometheus-k8s --channel beta juju deploy traefik-k8s --channel edge juju relate prometheus-k8s:ingress traefik
juju remove-application traefik-k8s juju deploy traefik-k8s --channel edge --application-name='trfk' juju relate prometheus-k8s:ingress trfk
Environment
edge
Relevant log output