Open sed-i opened 1 year ago
Juju actually allows the charm to access (stale?) relation data in relation-broken
. However...
I've just tested this using two charms related to one another, each having a relation-broken hook and accessing relation data (from "this" or the "remote" app). The Harness actually gets this fairly different from real Juju in a number of cases.
Below are the results. The "setter" charm is the charm in the relation that's setting relation data via event.relation.data[self.app]["key"] = value
, and the "getter" charm is the charm that's reading relation data.
charm | which app | Juju result | Harness result |
---|---|---|---|
setter | this | {...data...} |
{...data...} |
setter | remote | {} |
RuntimeError |
getter | this | {} |
RelationDataAccessError |
getter | remote | {...data...} |
RuntimeError |
RuntimeError
s have the message "remote-side relation data cannot be accessed during a relation-broken event".RelationDataAccessError
message is "unit/0 is not leader and cannot read its own application databag", and in __repr__
this gets caught and converted to the string <n/a>
.It seems odd that the Harness deviates so much from real Juju, which just allows reads in each case, even if the data is not useful/stale.
I presume we intentionally raise more errors than Juju in tests to try to catch problems early -- for example, the charm probably shouldn't be accessing remote relation data during relation-broken
(but Juju lets you). And I'm not sure about the RelationDataAccessError
for the case when the "getter" charm tries to read its own data -- that doesn't seem correct.
As to the original issue, it seems like the data is unable to be accessed (in 3 out of 4 cases!). @sed-i, can you post the actual code you were working with when you ran into this? Were you fetching relation data? Was it in the "getter" or "setter" charm? And was it via self.app
(this) or event.app
(remote)?
In any case, we should decide whether we want to mimic real Juju more closely. Or we should raise an error consistently if you access relation data in relation-broken
in all cases. Is there ever a valid use case for doing that? @jameinel, thoughts?
can you post the actual code you were working with when you ran into this?
Here's the utest that expects the charmlib/juju to take care of cleaning up relation data.
Specifically, the utest was expecting that whatever custom events fire as a result of self.harness.remove_relation(rel_id)
, would not see any relation data. I think in this test it's the remote data that is expected to go away.
Is there ever a valid use case for doing that?
The pattern we were taking in o11y charms more often than not, is that rel data represents the most up to date state.
After a relation-broken there is no other event that "reruns the charm" with the updated reldata. Relation-broken is the last chance to act on a change. This way, deep charm code doesn't need to know the event it's in (no need to if event is relation-broken then update everything ignoring data
).
I'm probably missing something obvious, but I don't quite understand -- doesn't the above table show that relation-broken on real Juju still includes the previous data? So wouldn't expecting the Harness to do something different mean the unit test will behave differently in unit tests compared to under real Juju?
I'm not sure I understand the table correctly: On relation broken, the remaining app can read the data that was set by the departed app?
Yeah, that's right -- that's the last row in the table. It shows the "getter" app (i.e., the other charm from the one that set the data) being able to read data that the remote app (the "setter") set. Under real Juju it can read this data, under the Harness you currently get a RuntimeError("remote-side relation data cannot be accessed during a relation-broken event")
, which doesn't seem to match reality. (@jameinel any idea why the Harness tries to be different/stricter than reality here?)
You can see this from the following log line:
unit-webapp-0: 15:25:26 INFO unit.webapp/0.juju-log db:0: webapp _on_db_relation_broken: <ops.model.Relation db:0> event_data={'db_password_id': 'secret:cf0c1lrs26oc7aah2260'}
The webapp charm is the "getter" in this case, and it was able to read that data -- during relation-broken -- that the database charm had set.
During relation-broken you should be able to read the data from the unit that was on the other end, if we aren't doing that in Harness, we should allow it. (my only caveat is that really relation-departing is where you should be handling the information, by the time you get to 'relation-broken' its because the relation really is gone and you're supposed to be finalizing things, not doing more configuration based on the last thing that the remote unit shared with you.)
Note while you can read, you shouldn't be allowed to set data at that point.
I also feel that 'relation-broken' really shouldn't be "just another relation event like all the other ones". Deleting a file on disk is completely different to modifying it, or creating it. So the original argument that "I want a catch all behavior" doesn't really fit for me.
John =:->
On Tue, Jan 17, 2023 at 8:58 PM Ben Hoyt @.***> wrote:
Yeah, that's right -- that's the last row in the table. It shows the "getter" app (i.e., the other charm from the one that set the data) being able to read data that the remote app (the "setter") set. Under real Juju it can read this data, under the Harness you currently get a RuntimeError("remote-side relation data cannot be accessed during a relation-broken event"), which doesn't seem to match reality. @.*** https://github.com/jameinel any idea why the Harness tries to be different/stricter than reality here?)
You can see this from the following log line:
unit-webapp-0: 15:25:26 INFO unit.webapp/0.juju-log db:0: webapp _on_db_relation_broken:
event_data={'db_password_id': 'secret:cf0c1lrs26oc7aah2260'} The webapp charm is the "getter" in this case, and it was able to read that data -- during relation-broken -- that the database charm had set.
— Reply to this email directly, view it on GitHub https://github.com/canonical/operator/issues/888#issuecomment-1386355879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7IB6DJU7POVV7INTODWS5E5NANCNFSM6AAAAAATZLT5HE . You are receiving this because you were mentioned.Message ID: @.***>
Iiuc, this means that the following pattern is wrong:
def _on_relation_departed(self, _): # or broken
self._update_config() # regenerate everything from current rel data
and instead we should do something like:
def _on_relation_departed(self, event): # or broken
self._update_config(excluding=event.relation.data)
Is that correct?
In other words, from within relation-departed/broken:
event.relation
included in self.model.relations
?event.relation.data
included in self.model.relations[x].data
?Hi everyone, chiming in on this; is what @sed-i proposed the pattern we're supposed to follow?
@PietroPasotti just gave me an idea: If we always defer a relation-broken event, then next hook (on update-status the latest) there will be no data left in relation data, so charm code could operate on the entire relation data, i.e. without needing to work with the delta that a relation-broken implies.
This is not a great pattern, but it conveys well our dissonance about relation-broken.
FYI in some cases (but not all) accessing the remote application data in a relation broken event causes an error
See: https://bugs.launchpad.net/juju/+bug/1960934 https://github.com/canonical/operator/blob/734e12dcfde93d7081aed5573e011128d98fd84a/ops/model.py#L1341-L1349 https://github.com/canonical/mysql-router-k8s-operator/issues/73
What the Kubeflow team has seen with istio-pilot is like what @carlcsaposs-canonical reports. In live Juju when handling a relation-broken event:
event.app=DEPARTING_APP
(and I think relation.app
is the same thing? probably an alias?)event.app=None
Up until today, we had only seen case (1) and whenever we saw it, we also knew that the departing application's data is still in the relation data bag. We handled this by popping the departing data before using the data bag.
# (simplified version - differs slightly from the link)
if isinstance(event, RelationBrokenEvent):
relation_data.pop((event.relation, event.app))
Now that sometimes we see event.app=None
, I wonder if we should instead do something like:
if isinstance(event, RelationBrokenEvent):
try:
relation_data.pop((event.relation, event.app))
except KeyError:
log_and_pass # ?
The one question I have is whether, when event.app==None
, are we guaranteed that the departing application's data has been removed from the databag? If not, that will cause trouble as we can't pop
it
@jameinel Per the above and per https://bugs.launchpad.net/juju/+bug/1960934, it seems Juju is sometimes setting JUJU_REMOTE_APP
but sometimes not setting it. Do you think that could be fixed on the Juju side? Or I guess we could change Juju to always not set it, but that might be too breaking.
Also, I think we usually don't want to see the stale data even on relation-departed:
Real world example:
This issue is related to a frequent point of friction in charming: reconciling holistic vs deltas approaches.
Need to consider further what to do here. Possibly related to https://github.com/canonical/operator/issues/940 work.
Related: https://github.com/canonical/operator/issues/940#issuecomment-1623559826 (difference in usage between local and remote app data during relation-broken
)
This is fixed in ops 2.10, isn't it?
This is fixed in ops 2.10, isn't it?
I believe the relation data is still accessible—and I think it should be, for the local app/unit databags
@tonyandrewmeyer is going to investigate this further and then we'll make a decision here.
Currently, harness first emits
relation-broken
and only then invalidates the data.This means that relation data is still accessible by charm code while inside the
relation-broken
hook. Is that intentional?Based on the event page it sounds like charm code shouldn't be able to see any data when inside the
broken
hook.https://github.com/canonical/operator/blob/4d38ef7ce2fe6dfc4034db5957d9e526ce0cf3d9/ops/testing.py#L697-L702