Closed PietroPasotti closed 7 months ago
example of a horrible workaround I had to use: https://github.com/PietroPasotti/traefik-route-k8s-operator/blob/04a32ee3ae67483bf99075fba911e5e62d553801/tests/unit/conftest.py#L82
I believe it's a bit more nuanced than that, but please correct me if I'm wrong. My understanding is that if it is a custom event, then the call is made recursively without exiting, while hooks for proper Juju events would indeed exit between executions.
Very true! We'd have to check if the event being triggered is a custom one or has been fired by charm code. In 'real life' we could discern between 1) events that are being resurrected (because they had been deferred in the past), 2) events that are 'new' and being manually fired by charm code and 3) 'the event' that is causing this execution of the charm.
Only on 3) we want to reinitialize the charm.
If we are not in a live charm, we have to be more clever because there's no context telling us 'this is the event we are running', so a possible workaround would have to set some fancy event flags from within harness.emit().
Defer also sneaks its way into this. Deferred evens are run together on the same charm instance/setup as the event that comes in and triggers them.
Another side-effect of this issue is: Harness never fires framework.on.commit (advanced) features that depend on that hook to fire after every (other) hook execution are not testable with Harness.
@rwcarlsen I wrote a little POC wrapper for us to play with: https://github.com/PietroPasotti/harness-extensions#harness_ctx
A good first run at the problem. I really think this is going to intersect with the event sequence testing work (e.g. #696). This idea of sandboxing a particular charm scenario then running it and doing assertions seems to be very integral with both these ideas.
I'm working on ingress for prometheus now and this feature is painfully needed.
@sed-i have you tried out the harness_ctx?
@sed-i have you tried out the harness_ctx?
What's that? (Didn't find mentions in the usual places.)
What's that? (Didn't find mentions in the usual places.)
https://github.com/PietroPasotti/harness-extensions#harness_ctx
Afraid it wasn't really advertised, except for a comment on this issue. I'm working on its successor, but in the meanwhile, you could give this a go.
To be honest, I'm not really sure what the intent of this is, but at first glance it seems like an antipattern
self.relation1 = relation1 = RelationRequirer()
self.relation2 = RelationProvider(relation1.data[self.unit])
I really wouldn't want the act of relating A to B to cause A & C's relation to behave wildly differently. At least, as I read this, this says that this charm has multiple possible relation endpoints, and it only responds on a given (related) endpoint when a different related endpoint receives the name of that endpoint on a different relation.
I understand the more general concern about the Harness and the fact that it doesn't simulate the full teardown and refresh that happens when actually processing hooks. But I'd be concerned if this was a motivating use case (unless I'm misunderstanding something)
(at a minimum, relating A to B should not require that B knows the names of the endpoints of charm A. That is certainly an abstraction break)
Because defer
means events may actually get run on the same charm instance, I'm not sure about just making a change to Harness
here. We could change ops
to always re-initialize the charm between running deferred events and between deferred events and the main event. Or we could go all in on "the charm instance may be the same; don't rely on this". In any case, leaving open for now.
Oh, and see also Pietro's work on ops-scenario.
This issue puts charm authors in an odd position:
update
methods in charm code so harness picks up the changes?I now have failing utests after refactoring and I suspect it's because of this issue.
@sed-i I'm not sure I understand this:
because under harness they cannot enjoy the benefits of charm re-init. This leads to overcomplicated code.
This must mean that charms are storing state on their charm instance that would lead to "benefits of charm re-init". But isn't that a no-no? That is, charms should never be storing state in the charm instance, because it won't stay around anyway. What kind of state are you / others storing on the charm instance that would be affected by this.
And if your hooks are expecting charm state to be re-inited on every hook, that means defer right now won't work as expected, right? Because currently we don't re-init the charm between running deferred events and the "real" event.
I'd be interested to know what kind of state issue it is that's causing the unit test failures you're seeing.
That said, I'm still open to considering changing ops
(and Harness) to reinitialize the charm on every external event, to make things like this easier to avoid and reason about. I wonder if there are any pitfalls / backwards compatibility concerns of doing that.
I'm not sure I understand this:
because under harness they cannot enjoy the benefits of charm re-init. This leads to overcomplicated code.
This must mean that charms are storing state on their charm instance
Not sure what you mean by "storing state", but compare real charm and harness for juju relate
:
Juju | Harness | |
---|---|---|
Instruction | juju relate us:rel them |
self.harness.add_relation("rel", "them") |
Relation data updated | Yes | Yes |
Charm re-init? | Yes | No |
Helper objects reconstructed | Yes | No |
And if your hooks are expecting charm state to be re-inited on every hook, that means defer right now won't work as expected, right? Because currently we don't re-init the charm between running deferred events and the "real" event.
Correct, but thankfully we do not use defer
.
That being said, processing custom events also does NOT reinit the charm, which is a bit inconvenient.
I'm still open to considering changing
ops
(and Harness) to reinitialize the charm on every external event, to make things like this easier to avoid and reason about.
Avoiding needing to reason about it at all would be even better :D But I'm surprised you mention "every external event". Afaiu, ops in fact does reinit the charm on every juju core event.
@benhoyt for additional context, we store an abstraction of the config, relation data etc on the charm during __init__
. So if an event gets deferred and there are config changes that occur without the charm being re-initialised then we could have an old copy of the state. It would be good to be consistent with whether the charm is re-initialised and make that a guarantee devs can rely on. I.e., there shouldn't be a difference with how the event is triggered that makes the charm be re-initialised or not. And test behaviour should definitely match production behaviour as closely as possible
in fact does reinit the charm on every juju core event
ops
inits the charm once, on main.main(CharmType)
, i.e. reinits the charm on every juju event.
indeed it's an option to move the charm initialization logic to Framework.reemit()
so that a fresh charm instance processes each and every event (juju or custom).
I'm afraid that would break a hell of a lot of code, but maybe it's a good thing.
@benhoyt
I think it would be great if we could come to some kind of conclusion on this so that we can move forward. It's currently a bit hampering that we have disparate behavior in Juju and in Harness.
I've been doing some more thinking about this and discussing with various folks, and I think I understand the cases where this is problematic for charmers, or at the very least confusing.
For example, with @jdkandersson's example, they're storing a snapshot of the current state of the charm in a dataclass in charm __init__
, and expecting that to be reinitialised between every hook. I think that expectation is fair, because normally this is the case, that is, when Juju fires the hook. However, when an event is deferred by ops and re-emitted at the next Juju event, the charm instance isn't reinitialised between running the previously-deferred event and the incoming event, so if the deferred event modified config (say), that won't be reflected in the charm.state
dataclass instance.
It would be significantly simpler to reason about this if ops did reinitialise the charm instance before every hook call. This would apply to deferred events and Juju events (but not custom events -- I'll comment on #952, but I don't think that's a good idea). And of course we'd then update the Harness to match this behaviour.
I think this would be a backwards-compatible change, because charms shouldn't be relying on the charm instance not being new between events (again, normally it is a new instance, except for deferred events). So you might be able to come up with a contrived example that breaks, but it shouldn't break real charms, or if it does, that's probably exposing an actual bug in the charm.
I'd like to talk this through with @jameinel next week to ensure I'm not missing some historical or other context (he has a lot of background here).
Also, if @PietroPasotti or others have ideas as to how this should be implemented, I'm all ears. It's a bit tricky. Currently the charm is only instantiated once in ops/main.py
, so maybe it would mean saving a reference to the charm on the framework, and then manually calling framework.charm.__init__(framework)
before every event that's not a custom event.
@jameinel and I discussed this at some length today. His concerns with reinitializing the charm every Juju or deferred event are:
1) We've previously discussed the ability to handle bulk events from Juju (say 100's of relation-joined when there are 100's of units), and we don't want to paint ourselves into a corner where we can't handle bulk events effectively because we're recreating too much (the charm, caches) every event. 2) It's still different for custom events, which won't have the charm reinitialised before emitting them.
I still believe this will be simpler to reason about if ops reinitialised the charm before every hook (deferred and Juju events). That's what charmers expect, primarily because it's what happens most of the time in production (each event comes in as a separate Python invocation). I'd respond to John's points as follows:
1) I don't think we'd paint ourselves into a corner by changing the charm to be reinit'd before every event. The primary overhead in bulk event handling comes from starting up the Python interpreter each time, not creating a few Python objects and doing a few observe
calls. So if in future we handle bulk events in one Python interpreter execution, we'd still be able to achieve good performance.
2) That's a good point. However, custom events are different as it is, because they're emitted and the handlers called synchronously on emit
. As noted here, it's not feasible to reinit before custom events, in part because they're synchronous. We could change them to be non-synchronous (emitted at the end), but that's a significant behaviour change, and I think would be unexpected in other ways.
So I believe it's still best to recreate the charm instance before every deferred and Juju event, for both ops in production and (as much as possible) in Harness.
All that said, unfortunately I think this will have to wait till next cycle, as I've got more than enough this cycle, and this is likely quite tricky to implement. We can't just manually call charm.__init__
again because other attributes won't get reset, and mutable objects will get doubled up, etc. So we have to recreate the charm instance, and figure out how to wire that up. We may need to change how the framework adds attributes to classes rather than instances.
I'll add the 24.04 label. In the meantime, charmers need to continue on the current assumption: the charm isn't (necessarily) recreated between events. That's true for production ops (between deferred events and Juju events) as well as Harness testing (one charm instance per Harness instance, with multiple events emitted).
Thanks @benhoyt sounds reasonable
FYI there is an old closed PR where I proposed an implementation for this, harness-only. As for production ops, let's first decide if we want to keep supporting deferred events at all.
If we do, I think the cleanest way to achieve this is to pull the deferred event logic out of Framework and put it into ops.main.
ops.main
would then, for each deferred event, reinitialize the whole thing from scratch. Effectively calling what now is ops.main.main
multiple times, once for each deferred event.
As for reinitializing the charm between deferred events but not custom events, how about custom events that get deferred? I think the price for not doing this uniformly is a big bowl of confusion cereal.
sorry coming to this late...
I still believe this will be simpler to reason about if ops reinitialised the charm before every hook (deferred and Juju events). That's what charmers expect, primarily because it's what happens most of the time in production (each event comes in as a separate Python invocation).
This nails the whole conversation for me. Whether explicitly or by accident, the contract of an event in a charm has always felt like it included a fresh instantiation of the charm. Maybe this is me understanding the contract wrong, but I'd bet if we polled the charming teams most would have this view.
Extending this to custom events, its the exact same issue. At minimum, it is not intuitive that a custom event works differently than other events. By calling them all events, it implies they should behave the same way. It is also really hard to notice on your own that they behave differently - you wont know unless it burns you in a way that you can spot, and even then its probably a lot of debugging before you figure it out.
Another facet of this issue is that in harness tests I need to manually clear collected statuses in between parts of the same test:
self.harness.update_config(bad_cfg)
self.harness.evaluate_status()
self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus)
# Without this, the BlockedStatus persists from before
self.harness.charm.unit._collected_statuses.clear() # <----- HERE
self.harness.update_config(good_cfg)
self.harness.evaluate_status()
self.assertIsInstance(self.harness.charm.unit.status, ActiveStatus)
This :point_up: also means that custom events would never be able to remediate a status, because we only have the add_status
interface.
For example, if an earlier func adds a status, which could later on be resolved by a custom event (e.g. ingress lib emits "ingress ready"), then there is no way to clear the previous status, except waiting for the next core hook.
@sed-i I've tried to address the Harness.evaluate_status
issue in https://github.com/canonical/operator/pull/1048. I think that's just a bug in the implementation and it should have always cleared the previous collected statuses. Let me know what you think.
@sed-i As for your second comment, can you please give a more specific example of how this would happen? Custom events are fired "synchronously" during a hook execution, so by the time the collect-status event is executed after the hook is done, things will be in a stable state.
Thanks @benhoyt!
Custom events are fired "synchronously" during a hook execution, so by the time the collect-status event is executed after the hook is done, things will be in a stable state.
That is correct if we only add_status
in the charm's collect_status. If components (such as charm libs) observe that event, we may be in trouble. But you rightfully commented that best practice is to only do it in the charm. Currently both are undocumented pitfalls iirc.
For reference, see https://github.com/canonical/alertmanager-k8s-operator/pull/202 for an example of the kind of place this crops up.
this came up in https://github.com/juju/juju-controller/pull/56 as well
@benhoyt is there any effort placed on this bug? This makes harness less reliable in terms of validating that the test results match real life expectations.
We have to do these types of workarounds to avoid the issue:
Hi @gruyaume this is on our work list for the current cycle, and we still expect it to be completed before Madrid.
Hi @gruyaume this is on our work list for the current cycle, and we still expect it to be completed before Madrid.
Awesome, that's great to know!
cc @dariofaccin
We can have Harness
instantiate a new charm object (and have the framework forget the old one, or also create a new Framework
object) when it handles the emitting. However, we are left with significant problems.
I tried three main variants of having a fresh charm instance for each (Juju) hook:
begin
and the eventbegin_with_initial_hooks
so that it would only "reinitialise" once rather than 4+ timesRunning this version of Harness
against my collection of Canonical charms (currently 144, around 6 of which fail with the latest version of ops), the best case was the latter, and still 61 charms had failing tests, so >50 more than without the change. That's more breakage than we're willing to accept (or to help mitigate).
Some of the issues are actually problems with the charm (for example, there's a charm that does harness._charm
when it should do harness.charm
- I'll open a PR to fix that), and it's usually a fairly small number of tests that fail, not all of them. The failures all seem resolvable (e.g. by changing the way that mocking is done). However, this is a lot of adjustment across a lot of repos and teams.
I also tried one other approach, where I tracked the attributes set on the charm and raised an error if one was accessed that wasn't present during __init__
(more on this below).
The most recent example in the ticket where this is a problem - and one where the tests are currently doing a charm reinitialisation, is score-upf-k8s.
The tests all pass with ops 2.11. 9 (of 76) tests fail if the code that reinitialises the charm is removed. 6 tests still fail when using the code from above, because it's not cherry-picking when to reset like the existing code, it's always doing it.
emit()
themselvesIf the test code looks like this:
harness = ops.testing.Harness(MyCharm)
harness.begin()
harness.charm.on.install.emit()
# some asserts
harness.charm.on.start.emit()
The Harness
object is not involved between those two emits, so there's no opportunity to do the charm reset, without really ugly things like Harness monkeypatching emit()
. (This doesn't have to be an actual monkeypatch, since Harness
actually creates a fresh charm class and could replace .on
with something similar but different, but it feels the same to me).
Without patching emit()
, there's no solution here other than writing the charm tests in a way that doesn't assume that changes to the charm are consistent (see below for why that can be legitimate in tests). We could add to Harness to make this a nice experience, but we can't make it backwards compatible, and our future focus is more on Scenario than on Harness.
I think in general, if you're using Harness it would be nicer to avoid tests written in this style, and limit yourself to a single emit()
(whether explicit or via one of the Harness
methods) - e.g. have your fixture or setUp
create the Harness
object and then have other fixtures or utility functions that arrange it appropriately, so that you are almost always using a fresh charm.
Test code that looks like this:
harness = ops.testing.Harness(MyCharm)
harness.begin()
harness.add_relation("foo")
assert harness.charm.x == y
I believe this is reasonable to do, and it means that we cannot do the charm reset after Harness emits events, only before.
This is also an issue when a single Harness
call emits multiple events - in particular this happens with begin_with_initial_hooks()
(at minimum 4 emit()
s, plus more if there are containers, storages, relations, etc) and add_relation()
when supplying unit data.
We could special-case begin_with_initial_hooks()
(and add_relation()
) so that they only reset the charm once, although this then leaves us in a strange middle ground where the charm is reset except in some cases. However, that doesn't meaningfully decrease the number of existing charms' tests that doing a reset breaks.
There are tests that look like this:
class MyCharm(ops.CharmBase):
def __init__(self, framework):
super().__init__(framework)
self.foo = some.lib.object()
def test_something():
harness = ops.testing.Harness(MyCharm)
harness.begin()
harness.charm.foo = Mock() # probably actually autospec'ing
harness.add_relation("bar")
assert ...
I believe this is also a reasonable thing to do, and it means that we cannot do the charm reset after Harness emits events, only after (directly contradicting the above finding). If there's code like assert harness.charm.foo.called_once()
then it can't be done after, either.
This can be avoided by doing the mocking differently - instead of mocking the charm instance attribute, you can mock the class that it's instantiating and assigning to that attribute, for example. However, there are plenty of charms that have the existing behaviour, and I don't feel that we can legitimately expect them to change (in a minor release).
We could definitely design an API for Harness
that removed the ability to modify the charm - for example, we could have a .run
method (like _Manager.run
) that created the framework and charm instance and did the emitting, and have charmers use that rather than .emit
, and also have Harness
use it rather than directly emitting. We could have a context manager that gave access to the charm but restricted it to a single emit. There are plenty of other possibilities.
(There's also the tool that Pietro linked above, although I think generally you'd just use Scenario where those ideas moved to).
However, going forward, our unit testing story is focused on Scenario. Scenario was designed with this issue in mind. Regular use of Scenario, such as:
ctx = scenario.Context(MyCharm)
state2 = ctx.run(event, scenario.State())
assert ...
state3 = ctx.run(event, state2)
does not have this issue, because a fresh charm is created each time and the test code does not have access to it.
If you do need to get access to the charm, then you can do this:
ctx = scenario.Context(MyCharm)
with ctx.manager(event, scenario.State()) as mgr:
mgr.charm.on = Mock()
mgr.run()
assert mgr.charm.on.called_once()
The run()
method of the context manager can only be called once, so, although you have access to the charm, only one (Juju) event will be emitted.
It's possible to write Harness tests that would catch charm code that attempts to persist data on the charm instance. Essentially, all you need to do is ensure that you have one (Juju) event per Harness
instance (which probably means per test in most situations). Strictly speaking, you should either avoid begin_with_initial_hooks()
or ensure that you also have tests that validate each of the (Juju) events that triggers, if it's relevant to the charm. You should also avoid passing unit data to add_relation()
, adding the relation before begin()
(if you're testing the relation-changed
event).
I feel like writing tests in this style is generally a good approach, but not something that we would want to try and enforce or that would apply to every situation. If people are going to put effort into rewriting tests, we would probably rather than they look into writing Scenario (with the 7.x API) tests instead.
Rather than reinitialising the charm, we could have Harness identify when charms are relying on this behaviour, a little similar to how __slots__
works. Harness could add __getattribute__
and __setattr__
to the charm and raise an error when there's a get before a set, and reset that counter on emitting (Juju) events (by patching the emit
method).
I did a proof-of-concept implementation of this approach and didn't get any new charm tests failing, but this example does (on print(self.bar)
):
class MyCharm(ops.CharmBase):
def __init__(self, framework):
super().__init__(framework)
self.foo = 1
framework.observe(self.on.install, self._on_install)
framework.observe(self.on.start, self._on_start)
def _on_install(self, event):
self.bar = 2
print(self.foo)
def _on_start(self, event):
print(self.bar)
harness = ops.testing.Harness(MyCharm)
harness.begin()
try:
harness.charm.on.install.emit()
harness.charm.on.start.emit()
finally:
harness.cleanup()
I could go further into this approach (I haven't spent a huge amount of time on it, so there may be flaws, and perhaps it's not catching everything it should and if so would trigger failures - it does fail on one test_testing test because it asserts on an attribute that's added during the event, for example). It does only address the "people are wrongly relying on attributes persisting across events" issue and not anything else raised in this ticket, though.
Similar to the above, we might be able to build a static check as part of the linter development we're considering. 100% confidence is probably too hard to do statically, but I think it wouldn't be too difficult to have a linter warn when it seems like this might be happening. I haven't explored this in any detail, though.
We could add opt-in support to Harness to make this work - the opted in behaviour would break backwards compatibility in some ways, but allow people to use this behaviour if they wanted to. Pietro links to an old PR above, or I could do a fresh one based on what I had done here, that's based off the more recent ops codebase.
However, when Charm-Tech discussed this, we didn't feel that charms generally would opt in to new behaviour. If someone really does want behaviour like this, then we feel like pointing them at Scenario is the better choice.
__init__
behaves differently per eventFor the case in the ticket description, where the Charm initialises differently depending on the event: I think we should discourage this as bad practice (basically, what John says). If there is some use-case we're missing there, then I think we could just have an explicit "regenerate charm" Harness method (it would likely also need to regenerate the framework or have the framework forget the old charm and its observers), and people using this pattern could adjust their tests accordingly. I think it would be better to have a ticket specifically for that (with real-world use-cases) rather than mixing it with the "avoid people setting attributes in one event and using them in another" case as in this ticket.
One of the comments above mentioned that Harness doesn't fire the framework events (pre-commit, commit, collect-status). For collect-status, we have Harness.evaluate_status
, which seems sufficient (and I don't think we have heard otherwise). For pre-commit/commit, we had #1115 where we could have added those events (which would have triggered the type check). Doing this for events that Harness emits is simple enough (replacing the various emit()
calls with a wrapper that emits and also commits) - doing it when someone does e.g. harness.charm.on.start.emit()
can be done but is a lot more work. I think we'd still recommend just using Scenario if you need to test this behaviour.
There's a little bit of discussion above that suggests making deferred events less special, so they are using a "fresh" charm instance. This would be a change to main
rather than to testing
. I feel that there is some merit to this, and it seems unlikely that it would cause breakage. There would be a small performance penalty added to deferred events, but that shouldn't be noticeable. However, if we decide we want to do this, I think it should be done in a separate ticket and PR, aimed at making deferred events more consistent, rather than being about Harness. It might be worth a discussion now that we've settled on some defer use being appropriate, but it also might not be worth it if we're going to be able to add a Juju re-emit tool in the near future.
There's also discussion about custom events. I'm less convinced about these than deferred ones, but I do feel like breaking that out into a separate ticket and PR would make most sense (probably even separate from and after one for deferred events). If we have a fresh charm for custom events and for deferred events then non-Harness use would be consistent (and likely Scenario could be also), and we have to accept that Harness wasn't designed with this in mind, and works differently. On the other hand, we do currently say that setting charm attributes during event handlers that later get used by collect-status (to avoid duplicate checks) is ok, and that would break.
I do feel like pushing in that direction is the wrong way - it would be so much better if ops didn't reinitialise on every event. However, changes in that area don't seem close, and consistency is definitely an improvement.
Some of the examples linked here have expired, unfortunately - they are to CI logs that have expired, or are to charms where the code has moved on (rather than being to a specific revision), or were changed before merged (e.g.). I could spend time chasing up each of those with the teams but it doesn't seem like it would add a lot of additional context here - but it does suggest that we might need to do more work on collecting specifics in the ticket rather than relying on external information that might not last.
For what it's worth, I can only find one charm that is still doing any sort of framework._forget
, which makes it likely that people aren't reinitialising charms in that sort of way (you could perhaps use .clear
or do what _forget
does directly, instead). The one is sdcore-upf-k8s-operator (which is not in the list of Canonical charms, so missed in my earlier analysis).
Harness
the ability to regenerate the charm instance (so that people can make use of this if they want to), socialise that the ticket exists, and consider putting it on the roadmap if it gets enough support within some time period. I can re-use work I've done here to implement it (or at least re-use thinking).I realise that this isn't the outcome of this ticket that we were hoping for :disappointed:. It's also a concern that there are recent comments (<1 month) about hitting problems around this.
Thanks @tonyandrewmeyer for the thorough analysis and write-up. Let's discuss further in our 1:1 which of these recommendations we should go further with, if any, and go from there. One thing I think would be useful (in our chat or before) is to look at the recent comments from me, Pietro, and Guillaume pointing to specific issues, and see how you'd recommend they address those specific issues.
Summary, after @benhoyt and I discussed this in detail:
__init__
should not have code that's conditional on the Juju event (this includes conditional observing). Although the charm is instantiated for each Juju event (outside defer) right now, that may not always be the case, and it's not the ops design. We'll work on improving how the ops story is told to try to improve understanding (in general, no specific action item here). Charms can have guards on observers (including decorators), can have an observer set charm attributes, and so on. We're happy to provide advice to charmers on specific issues around this, and if there are patterns then we'll try to codify those in docs.LifecycleEvent
s (pre-commit, commit). However, we again recognise that Scenario already provides this for anyone that needs it, and we feel that these are not widely used, so it's not something we should focus on at the moment. If anyone wants this, it should be a new ticket, since it's separate from (although related to) "charm initialisation in Harness".
More specifically, before every hook execution. Point is: when a 'real' charm runs, it is reinitialized afresh. While if you do fire eventA and eventB on the harness, the harness won't reinitialize the charm it holds in between, which may result in some major behavioural differences between a live charm and a harness'd charm.
Use cases include:
Concrete example of an affected charm: https://github.com/canonical/prometheus-k8s-operator/blob/main/src/charm.py