PietroPasotti commented 2 years ago

More specifically, before every hook execution. Point is: when a 'real' charm runs, it is reinitialized afresh. While if you do fire eventA and eventB on the harness, the harness won't reinitialize the charm it holds in between, which may result in some major behavioural differences between a live charm and a harness'd charm.

Use cases include:

class Charm(CharmBase):
     def __init__(self):
         self.relation1 = relation1 = RelationRequirer()
         self.relation2 = RelationProvider(relation1.data[self.unit])

class Charm(CharmBase):
     def __init__(self):
         if self.container.can_connect():
             self.framework.observe(self.on.foo, self._foo)

Concrete example of an affected charm: https://github.com/canonical/prometheus-k8s-operator/blob/main/src/charm.py

PietroPasotti commented 2 years ago

example of a horrible workaround I had to use: https://github.com/PietroPasotti/traefik-route-k8s-operator/blob/04a32ee3ae67483bf99075fba911e5e62d553801/tests/unit/conftest.py#L82

simskij commented 2 years ago

I believe it's a bit more nuanced than that, but please correct me if I'm wrong. My understanding is that if it is a custom event, then the call is made recursively without exiting, while hooks for proper Juju events would indeed exit between executions.

PietroPasotti commented 2 years ago

Very true! We'd have to check if the event being triggered is a custom one or has been fired by charm code. In 'real life' we could discern between 1) events that are being resurrected (because they had been deferred in the past), 2) events that are 'new' and being manually fired by charm code and 3) 'the event' that is causing this execution of the charm.

Only on 3) we want to reinitialize the charm.

If we are not in a live charm, we have to be more clever because there's no context telling us 'this is the event we are running', so a possible workaround would have to set some fancy event flags from within harness.emit().

rwcarlsen commented 2 years ago

Defer also sneaks its way into this. Deferred evens are run together on the same charm instance/setup as the event that comes in and triggers them.

PietroPasotti commented 2 years ago

Another side-effect of this issue is: Harness never fires framework.on.commit (advanced) features that depend on that hook to fire after every (other) hook execution are not testable with Harness.

PietroPasotti commented 2 years ago

@rwcarlsen I wrote a little POC wrapper for us to play with: https://github.com/PietroPasotti/harness-extensions#harness_ctx

rwcarlsen commented 2 years ago

A good first run at the problem. I really think this is going to intersect with the event sequence testing work (e.g. #696). This idea of sandboxing a particular charm scenario then running it and doing assertions seems to be very integral with both these ideas.

sed-i commented 2 years ago

I'm working on ingress for prometheus now and this feature is painfully needed.

PietroPasotti commented 2 years ago

@sed-i have you tried out the harness_ctx?

sed-i commented 2 years ago

@sed-i have you tried out the harness_ctx?

What's that? (Didn't find mentions in the usual places.)

PietroPasotti commented 2 years ago

What's that? (Didn't find mentions in the usual places.)

https://github.com/PietroPasotti/harness-extensions#harness_ctx

Afraid it wasn't really advertised, except for a comment on this issue. I'm working on its successor, but in the meanwhile, you could give this a go.

jameinel commented 2 years ago

To be honest, I'm not really sure what the intent of this is, but at first glance it seems like an antipattern

         self.relation1 = relation1 = RelationRequirer()
         self.relation2 = RelationProvider(relation1.data[self.unit])

I really wouldn't want the act of relating A to B to cause A & C's relation to behave wildly differently. At least, as I read this, this says that this charm has multiple possible relation endpoints, and it only responds on a given (related) endpoint when a different related endpoint receives the name of that endpoint on a different relation.

I understand the more general concern about the Harness and the fact that it doesn't simulate the full teardown and refresh that happens when actually processing hooks. But I'd be concerned if this was a motivating use case (unless I'm misunderstanding something)

jameinel commented 2 years ago

(at a minimum, relating A to B should not require that B knows the names of the endpoints of charm A. That is certainly an abstraction break)

benhoyt commented 1 year ago

Because defer means events may actually get run on the same charm instance, I'm not sure about just making a change to Harness here. We could change ops to always re-initialize the charm between running deferred events and between deferred events and the main event. Or we could go all in on "the charm instance may be the same; don't rely on this". In any case, leaving open for now.

Oh, and see also Pietro's work on ops-scenario.

sed-i commented 1 year ago

This issue puts charm authors in an odd position:

If they use harness for development (i.e. before going to manual tests) then they get incorrect feedback about how their operator logic operates, because under harness they cannot enjoy the benefits of charm re-init. This leads to overcomplicated code.
If they refactor existing code, they may get false positives from harness and find themselves at a crossroads:
- ihntroduce some ugly workaround in unit tests that behaves similar to re-init?
- introduce unnecessary update methods in charm code so harness picks up the changes?
- delete the test?
- migrate to scenario tests?

I now have failing utests after refactoring and I suspect it's because of this issue.

benhoyt commented 1 year ago

@sed-i I'm not sure I understand this:

because under harness they cannot enjoy the benefits of charm re-init. This leads to overcomplicated code.

This must mean that charms are storing state on their charm instance that would lead to "benefits of charm re-init". But isn't that a no-no? That is, charms should never be storing state in the charm instance, because it won't stay around anyway. What kind of state are you / others storing on the charm instance that would be affected by this.

And if your hooks are expecting charm state to be re-inited on every hook, that means defer right now won't work as expected, right? Because currently we don't re-init the charm between running deferred events and the "real" event.

I'd be interested to know what kind of state issue it is that's causing the unit test failures you're seeing.

That said, I'm still open to considering changing ops (and Harness) to reinitialize the charm on every external event, to make things like this easier to avoid and reason about. I wonder if there are any pitfalls / backwards compatibility concerns of doing that.

sed-i commented 1 year ago

I'm not sure I understand this:

because under harness they cannot enjoy the benefits of charm re-init. This leads to overcomplicated code.

This must mean that charms are storing state on their charm instance

Not sure what you mean by "storing state", but compare real charm and harness for juju relate:

	Juju	Harness
Instruction	`juju relate us:rel them`	`self.harness.add_relation("rel", "them")`
Relation data updated	Yes	Yes
Charm re-init?	Yes	No
Helper objects reconstructed	Yes	No

And if your hooks are expecting charm state to be re-inited on every hook, that means defer right now won't work as expected, right? Because currently we don't re-init the charm between running deferred events and the "real" event.

Correct, but thankfully we do not use defer. That being said, processing custom events also does NOT reinit the charm, which is a bit inconvenient.

I'm still open to considering changing ops (and Harness) to reinitialize the charm on every external event, to make things like this easier to avoid and reason about.

Avoiding needing to reason about it at all would be even better :D But I'm surprised you mention "every external event". Afaiu, ops in fact does reinit the charm on every juju core event.

jdkandersson commented 1 year ago

@benhoyt for additional context, we store an abstraction of the config, relation data etc on the charm during __init__. So if an event gets deferred and there are config changes that occur without the charm being re-initialised then we could have an old copy of the state. It would be good to be consistent with whether the charm is re-initialised and make that a guarantee devs can rely on. I.e., there shouldn't be a difference with how the event is triggered that makes the charm be re-initialised or not. And test behaviour should definitely match production behaviour as closely as possible

PietroPasotti commented 1 year ago

in fact does reinit the charm on every juju core event

ops inits the charm once, on main.main(CharmType), i.e. reinits the charm on every juju event. indeed it's an option to move the charm initialization logic to Framework.reemit() so that a fresh charm instance processes each and every event (juju or custom).

I'm afraid that would break a hell of a lot of code, but maybe it's a good thing.

simskij commented 1 year ago

@benhoyt

I think it would be great if we could come to some kind of conclusion on this so that we can move forward. It's currently a bit hampering that we have disparate behavior in Juju and in Harness.

benhoyt commented 1 year ago

I've been doing some more thinking about this and discussing with various folks, and I think I understand the cases where this is problematic for charmers, or at the very least confusing.

For example, with @jdkandersson's example, they're storing a snapshot of the current state of the charm in a dataclass in charm __init__, and expecting that to be reinitialised between every hook. I think that expectation is fair, because normally this is the case, that is, when Juju fires the hook. However, when an event is deferred by ops and re-emitted at the next Juju event, the charm instance isn't reinitialised between running the previously-deferred event and the incoming event, so if the deferred event modified config (say), that won't be reflected in the charm.state dataclass instance.

It would be significantly simpler to reason about this if ops did reinitialise the charm instance before every hook call. This would apply to deferred events and Juju events (but not custom events -- I'll comment on #952, but I don't think that's a good idea). And of course we'd then update the Harness to match this behaviour.

I think this would be a backwards-compatible change, because charms shouldn't be relying on the charm instance not being new between events (again, normally it is a new instance, except for deferred events). So you might be able to come up with a contrived example that breaks, but it shouldn't break real charms, or if it does, that's probably exposing an actual bug in the charm.

I'd like to talk this through with @jameinel next week to ensure I'm not missing some historical or other context (he has a lot of background here).

Also, if @PietroPasotti or others have ideas as to how this should be implemented, I'm all ears. It's a bit tricky. Currently the charm is only instantiated once in ops/main.py, so maybe it would mean saving a reference to the charm on the framework, and then manually calling framework.charm.__init__(framework) before every event that's not a custom event.

benhoyt commented 1 year ago

@jameinel and I discussed this at some length today. His concerns with reinitializing the charm every Juju or deferred event are:

1) We've previously discussed the ability to handle bulk events from Juju (say 100's of relation-joined when there are 100's of units), and we don't want to paint ourselves into a corner where we can't handle bulk events effectively because we're recreating too much (the charm, caches) every event. 2) It's still different for custom events, which won't have the charm reinitialised before emitting them.

I still believe this will be simpler to reason about if ops reinitialised the charm before every hook (deferred and Juju events). That's what charmers expect, primarily because it's what happens most of the time in production (each event comes in as a separate Python invocation). I'd respond to John's points as follows:

1) I don't think we'd paint ourselves into a corner by changing the charm to be reinit'd before every event. The primary overhead in bulk event handling comes from starting up the Python interpreter each time, not creating a few Python objects and doing a few observe calls. So if in future we handle bulk events in one Python interpreter execution, we'd still be able to achieve good performance. 2) That's a good point. However, custom events are different as it is, because they're emitted and the handlers called synchronously on emit. As noted here, it's not feasible to reinit before custom events, in part because they're synchronous. We could change them to be non-synchronous (emitted at the end), but that's a significant behaviour change, and I think would be unexpected in other ways.

So I believe it's still best to recreate the charm instance before every deferred and Juju event, for both ops in production and (as much as possible) in Harness.

All that said, unfortunately I think this will have to wait till next cycle, as I've got more than enough this cycle, and this is likely quite tricky to implement. We can't just manually call charm.__init__ again because other attributes won't get reset, and mutable objects will get doubled up, etc. So we have to recreate the charm instance, and figure out how to wire that up. We may need to change how the framework adds attributes to classes rather than instances.

I'll add the 24.04 label. In the meantime, charmers need to continue on the current assumption: the charm isn't (necessarily) recreated between events. That's true for production ops (between deferred events and Juju events) as well as Harness testing (one charm instance per Harness instance, with multiple events emitted).

jdkandersson commented 1 year ago

Thanks @benhoyt sounds reasonable

PietroPasotti commented 1 year ago

FYI there is an old closed PR where I proposed an implementation for this, harness-only. As for production ops, let's first decide if we want to keep supporting deferred events at all.

If we do, I think the cleanest way to achieve this is to pull the deferred event logic out of Framework and put it into ops.main. ops.main would then, for each deferred event, reinitialize the whole thing from scratch. Effectively calling what now is ops.main.main multiple times, once for each deferred event.

As for reinitializing the charm between deferred events but not custom events, how about custom events that get deferred? I think the price for not doing this uniformly is a big bowl of confusion cereal.

ca-scribner commented 1 year ago

sorry coming to this late...

I still believe this will be simpler to reason about if ops reinitialised the charm before every hook (deferred and Juju events). That's what charmers expect, primarily because it's what happens most of the time in production (each event comes in as a separate Python invocation).

This nails the whole conversation for me. Whether explicitly or by accident, the contract of an event in a charm has always felt like it included a fresh instantiation of the charm. Maybe this is me understanding the contract wrong, but I'd bet if we polled the charming teams most would have this view.

Extending this to custom events, its the exact same issue. At minimum, it is not intuitive that a custom event works differently than other events. By calling them all events, it implies they should behave the same way. It is also really hard to notice on your own that they behave differently - you wont know unless it burns you in a way that you can spot, and even then its probably a lot of debugging before you figure it out.

sed-i commented 1 year ago

Another facet of this issue is that in harness tests I need to manually clear collected statuses in between parts of the same test:

self.harness.update_config(bad_cfg)
self.harness.evaluate_status()
self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus)

# Without this, the BlockedStatus persists from before
self.harness.charm.unit._collected_statuses.clear()  # <----- HERE

self.harness.update_config(good_cfg)
self.harness.evaluate_status()
self.assertIsInstance(self.harness.charm.unit.status, ActiveStatus)

sed-i commented 1 year ago

This :point_up: also means that custom events would never be able to remediate a status, because we only have the add_status interface. For example, if an earlier func adds a status, which could later on be resolved by a custom event (e.g. ingress lib emits "ingress ready"), then there is no way to clear the previous status, except waiting for the next core hook.

benhoyt commented 1 year ago

@sed-i I've tried to address the Harness.evaluate_status issue in https://github.com/canonical/operator/pull/1048. I think that's just a bug in the implementation and it should have always cleared the previous collected statuses. Let me know what you think.

benhoyt commented 1 year ago

@sed-i As for your second comment, can you please give a more specific example of how this would happen? Custom events are fired "synchronously" during a hook execution, so by the time the collect-status event is executed after the hook is done, things will be in a stable state.

sed-i commented 1 year ago

Thanks @benhoyt!

Custom events are fired "synchronously" during a hook execution, so by the time the collect-status event is executed after the hook is done, things will be in a stable state.

That is correct if we only add_status in the charm's collect_status. If components (such as charm libs) observe that event, we may be in trouble. But you rightfully commented that best practice is to only do it in the charm. Currently both are undocumented pitfalls iirc.

benhoyt commented 1 year ago

For reference, see https://github.com/canonical/alertmanager-k8s-operator/pull/202 for an example of the kind of place this crops up.

PietroPasotti commented 11 months ago

this came up in https://github.com/juju/juju-controller/pull/56 as well

gruyaume commented 7 months ago

@benhoyt is there any effort placed on this bug? This makes harness less reliable in terms of validating that the test results match real life expectations.

We have to do these types of workarounds to avoid the issue:

https://github.com/canonical/sdcore-upf-k8s-operator/blob/main/tests/unit/test_charm.py#L77

tonyandrewmeyer commented 7 months ago

Hi @gruyaume this is on our work list for the current cycle, and we still expect it to be completed before Madrid.

gruyaume commented 7 months ago

Hi @gruyaume this is on our work list for the current cycle, and we still expect it to be completed before Madrid.

Awesome, that's great to know!

cc @dariofaccin

tonyandrewmeyer commented 7 months ago

We can have Harness instantiate a new charm object (and have the framework forget the old one, or also create a new Framework object) when it handles the emitting. However, we are left with significant problems.

Analysis

I tried three main variants of having a fresh charm instance for each (Juju) hook:

One that created a fresh charm when the event was being emitted
One that created a fresh charm after the event was emitted, so that if there was a second (Juju) event the charm would be fresh, but you don't lose anything done between begin and the event
One that special-cased begin_with_initial_hooks so that it would only "reinitialise" once rather than 4+ times

Running this version of Harness against my collection of Canonical charms (currently 144, around 6 of which fail with the latest version of ops), the best case was the latter, and still 61 charms had failing tests, so >50 more than without the change. That's more breakage than we're willing to accept (or to help mitigate).

Some of the issues are actually problems with the charm (for example, there's a charm that does harness._charm when it should do harness.charm - I'll open a PR to fix that), and it's usually a fairly small number of tests that fail, not all of them. The failures all seem resolvable (e.g. by changing the way that mocking is done). However, this is a lot of adjustment across a lot of repos and teams.

I also tried one other approach, where I tracked the attributes set on the charm and raised an error if one was accessed that wasn't present during __init__ (more on this below).

sdcore-upf-k8s

The most recent example in the ticket where this is a problem - and one where the tests are currently doing a charm reinitialisation, is score-upf-k8s.

The tests all pass with ops 2.11. 9 (of 76) tests fail if the code that reinitialises the charm is removed. 6 tests still fail when using the code from above, because it's not cherry-picking when to reset like the existing code, it's always doing it.

Issues

Charm tests sometimes do the `emit()` themselves

If the test code looks like this:

harness = ops.testing.Harness(MyCharm)
harness.begin()
harness.charm.on.install.emit()
# some asserts
harness.charm.on.start.emit()

The Harness object is not involved between those two emits, so there's no opportunity to do the charm reset, without really ugly things like Harness monkeypatching emit(). (This doesn't have to be an actual monkeypatch, since Harness actually creates a fresh charm class and could replace .on with something similar but different, but it feels the same to me).

Without patching emit(), there's no solution here other than writing the charm tests in a way that doesn't assume that changes to the charm are consistent (see below for why that can be legitimate in tests). We could add to Harness to make this a nice experience, but we can't make it backwards compatible, and our future focus is more on Scenario than on Harness.

I think in general, if you're using Harness it would be nicer to avoid tests written in this style, and limit yourself to a single emit() (whether explicit or via one of the Harness methods) - e.g. have your fixture or setUp create the Harness object and then have other fixtures or utility functions that arrange it appropriately, so that you are almost always using a fresh charm.

Some tests expect to be able to assert on the charm after the event

Test code that looks like this:

harness = ops.testing.Harness(MyCharm)
harness.begin()
harness.add_relation("foo")
assert harness.charm.x == y

I believe this is reasonable to do, and it means that we cannot do the charm reset after Harness emits events, only before.

This is also an issue when a single Harness call emits multiple events - in particular this happens with begin_with_initial_hooks() (at minimum 4 emit()s, plus more if there are containers, storages, relations, etc) and add_relation() when supplying unit data.

We could special-case begin_with_initial_hooks() (and add_relation()) so that they only reset the charm once, although this then leaves us in a strange middle ground where the charm is reset except in some cases. However, that doesn't meaningfully decrease the number of existing charms' tests that doing a reset breaks.

Some tests make changes to the charm object

There are tests that look like this:

class MyCharm(ops.CharmBase):
    def __init__(self, framework):
        super().__init__(framework)
        self.foo = some.lib.object()

def test_something():
    harness = ops.testing.Harness(MyCharm)
    harness.begin()
    harness.charm.foo = Mock()  # probably actually autospec'ing
    harness.add_relation("bar")
    assert ...

I believe this is also a reasonable thing to do, and it means that we cannot do the charm reset after Harness emits events, only after (directly contradicting the above finding). If there's code like assert harness.charm.foo.called_once() then it can't be done after, either.

This can be avoided by doing the mocking differently - instead of mocking the charm instance attribute, you can mock the class that it's instantiating and assigning to that attribute, for example. However, there are plenty of charms that have the existing behaviour, and I don't feel that we can legitimately expect them to change (in a minor release).

Potential Solutions

Scenario

We could definitely design an API for Harness that removed the ability to modify the charm - for example, we could have a .run method (like _Manager.run) that created the framework and charm instance and did the emitting, and have charmers use that rather than .emit, and also have Harness use it rather than directly emitting. We could have a context manager that gave access to the charm but restricted it to a single emit. There are plenty of other possibilities.

(There's also the tool that Pietro linked above, although I think generally you'd just use Scenario where those ideas moved to).

However, going forward, our unit testing story is focused on Scenario. Scenario was designed with this issue in mind. Regular use of Scenario, such as:

ctx = scenario.Context(MyCharm)
state2 = ctx.run(event, scenario.State())
assert ...
state3 = ctx.run(event, state2)

does not have this issue, because a fresh charm is created each time and the test code does not have access to it.

If you do need to get access to the charm, then you can do this:

ctx = scenario.Context(MyCharm)
with ctx.manager(event, scenario.State()) as mgr:
    mgr.charm.on = Mock()
    mgr.run()
    assert mgr.charm.on.called_once()

The run() method of the context manager can only be called once, so, although you have access to the charm, only one (Juju) event will be emitted.

Harness test style

It's possible to write Harness tests that would catch charm code that attempts to persist data on the charm instance. Essentially, all you need to do is ensure that you have one (Juju) event per Harness instance (which probably means per test in most situations). Strictly speaking, you should either avoid begin_with_initial_hooks() or ensure that you also have tests that validate each of the (Juju) events that triggers, if it's relevant to the charm. You should also avoid passing unit data to add_relation(), adding the relation before begin() (if you're testing the relation-changed event).

I feel like writing tests in this style is generally a good approach, but not something that we would want to try and enforce or that would apply to every situation. If people are going to put effort into rewriting tests, we would probably rather than they look into writing Scenario (with the 7.x API) tests instead.

Detecting access during test execution

Rather than reinitialising the charm, we could have Harness identify when charms are relying on this behaviour, a little similar to how __slots__ works. Harness could add __getattribute__ and __setattr__ to the charm and raise an error when there's a get before a set, and reset that counter on emitting (Juju) events (by patching the emit method).

I did a proof-of-concept implementation of this approach and didn't get any new charm tests failing, but this example does (on print(self.bar)):

class MyCharm(ops.CharmBase):
    def __init__(self, framework):
        super().__init__(framework)
        self.foo = 1
        framework.observe(self.on.install, self._on_install)
        framework.observe(self.on.start, self._on_start)
    def _on_install(self, event):
        self.bar = 2
        print(self.foo)
    def _on_start(self, event):
        print(self.bar)

harness = ops.testing.Harness(MyCharm)
harness.begin()
try:
    harness.charm.on.install.emit()
    harness.charm.on.start.emit()
finally:
    harness.cleanup()

I could go further into this approach (I haven't spent a huge amount of time on it, so there may be flaws, and perhaps it's not catching everything it should and if so would trigger failures - it does fail on one test_testing test because it asserts on an attribute that's added during the event, for example). It does only address the "people are wrongly relying on attributes persisting across events" issue and not anything else raised in this ticket, though.

Static checking

Similar to the above, we might be able to build a static check as part of the linter development we're considering. 100% confidence is probably too hard to do statically, but I think it wouldn't be too difficult to have a linter warn when it seems like this might be happening. I haven't explored this in any detail, though.

Opt-in

We could add opt-in support to Harness to make this work - the opted in behaviour would break backwards compatibility in some ways, but allow people to use this behaviour if they wanted to. Pietro links to an old PR above, or I could do a fresh one based on what I had done here, that's based off the more recent ops codebase.

However, when Charm-Tech discussed this, we didn't feel that charms generally would opt in to new behaviour. If someone really does want behaviour like this, then we feel like pointing them at Scenario is the better choice.

Other Notes

`init` behaves differently per event

For the case in the ticket description, where the Charm initialises differently depending on the event: I think we should discourage this as bad practice (basically, what John says). If there is some use-case we're missing there, then I think we could just have an explicit "regenerate charm" Harness method (it would likely also need to regenerate the framework or have the framework forget the old charm and its observers), and people using this pattern could adjust their tests accordingly. I think it would be better to have a ticket specifically for that (with real-world use-cases) rather than mixing it with the "avoid people setting attributes in one event and using them in another" case as in this ticket.

Harness doesn't fire framework events

One of the comments above mentioned that Harness doesn't fire the framework events (pre-commit, commit, collect-status). For collect-status, we have Harness.evaluate_status, which seems sufficient (and I don't think we have heard otherwise). For pre-commit/commit, we had #1115 where we could have added those events (which would have triggered the type check). Doing this for events that Harness emits is simple enough (replacing the various emit() calls with a wrapper that emits and also commits) - doing it when someone does e.g. harness.charm.on.start.emit() can be done but is a lot more work. I think we'd still recommend just using Scenario if you need to test this behaviour.

Deferred events do reuse the charm

There's a little bit of discussion above that suggests making deferred events less special, so they are using a "fresh" charm instance. This would be a change to main rather than to testing. I feel that there is some merit to this, and it seems unlikely that it would cause breakage. There would be a small performance penalty added to deferred events, but that shouldn't be noticeable. However, if we decide we want to do this, I think it should be done in a separate ticket and PR, aimed at making deferred events more consistent, rather than being about Harness. It might be worth a discussion now that we've settled on some defer use being appropriate, but it also might not be worth it if we're going to be able to add a Juju re-emit tool in the near future.

Custom events also reuse the charm

There's also discussion about custom events. I'm less convinced about these than deferred ones, but I do feel like breaking that out into a separate ticket and PR would make most sense (probably even separate from and after one for deferred events). If we have a fresh charm for custom events and for deferred events then non-Harness use would be consistent (and likely Scenario could be also), and we have to accept that Harness wasn't designed with this in mind, and works differently. On the other hand, we do currently say that setting charm attributes during event handlers that later get used by collect-status (to avoid duplicate checks) is ok, and that would break.

I do feel like pushing in that direction is the wrong way - it would be so much better if ops didn't reinitialise on every event. However, changes in that area don't seem close, and consistency is definitely an improvement.

Lost context

Some of the examples linked here have expired, unfortunately - they are to CI logs that have expired, or are to charms where the code has moved on (rather than being to a specific revision), or were changed before merged (e.g.). I could spend time chasing up each of those with the teams but it doesn't seem like it would add a lot of additional context here - but it does suggest that we might need to do more work on collecting specifics in the ticket rather than relying on external information that might not last.

For what it's worth, I can only find one charm that is still doing any sort of framework._forget, which makes it likely that people aren't reinitialising charms in that sort of way (you could perhaps use .clear or do what _forget does directly, instead). The one is sdcore-upf-k8s-operator (which is not in the list of Canonical charms, so missed in my earlier analysis).

Recommendations

Open a ticket for having deferred events get a fresh charm, socialise that the ticket exists, and put it on the roadmap if it gets enough support ( :heart: 's, maybe?) within some time period. If not, close the ticket and use it as a place to point people when they ask about it.
Charm-Tech discussion about whether I should invest more time into the "detect improper attribute access" approach or not.
Put lots of energy into Scenario, and really encourage people to make use of it.
Open a ticket for giving Harness the ability to regenerate the charm instance (so that people can make use of this if they want to), socialise that the ticket exists, and consider putting it on the roadmap if it gets enough support within some time period. I can re-use work I've done here to implement it (or at least re-use thinking).
Ensure that people know they can come to Charm Tech if they run into this problem, and we can provide advice on how to refactor code or tests to avoid it. This could maybe be a doc, but I don't have a clear picture on what that would look like.
Open a ticket for having Harness emit framework events, socialise that the ticket exists, and consider putting it on the roadmap if it gets enough support within some time period. I can re-use work/thinking I've done here on this.
I spend time looking at sdcore-upf-k8s-operator and come up with an alternative way to do the tests in Harness that avoids needing to do the reset and submit that as a PR, and use that as a basis for guidance for others.

I realise that this isn't the outcome of this ticket that we were hoping for :disappointed:. It's also a concern that there are recent comments (<1 month) about hitting problems around this.

benhoyt commented 7 months ago

Thanks @tonyandrewmeyer for the thorough analysis and write-up. Let's discuss further in our 1:1 which of these recommendations we should go further with, if any, and go from there. One thing I think would be useful (in our chat or before) is to look at the recent comments from me, Pietro, and Guillaume pointing to specific issues, and see how you'd recommend they address those specific issues.

tonyandrewmeyer commented 7 months ago

Summary, after @benhoyt and I discussed this in detail:

We agree that it would be much better if Harness did reinitialise the charm, so that the behaviour matched production, and we really did want to provide that. However, we can't do it without breaking a large number of charm tests, or by introducing a new API. We're not willing to do the former, and for the latter we believe that Scenario already provides a compelling choice, and it makes more sense to write a Scenario test than for us to build a (probably Scenario-like) new API on Harness. This is unfortunate, but the Harness design has us painted into a corner.
We agree that a fresh charm for deferred events would be a good change, but it's out of scope for this "charm initialisation in Harness" ticket. Opened #1174.
We're not going to invest more time into detecting people wrongly re-using charm attributes across Juju events. We encourage people to have some Scenario tests, which will catch this behaviour.
We decided against adding a "reinitialise charm" API to Harness. Scenario has this behaviour already, and we'd rather that people spent their resources adding Scenario tests (and Charm Tech spent resources on Scenario improvements).
I'll open an issue or PR for suggestions regarding score-upf-k8s-operator. I have only lightly looked over the charm/tests so far, but it seems like the need for reinstantiating the charm can be avoided with a few small changes.
We believe that the charm's __init__ should not have code that's conditional on the Juju event (this includes conditional observing). Although the charm is instantiated for each Juju event (outside defer) right now, that may not always be the case, and it's not the ops design. We'll work on improving how the ops story is told to try to improve understanding (in general, no specific action item here). Charms can have guards on observers (including decorators), can have an observer set charm attributes, and so on. We're happy to provide advice to charmers on specific issues around this, and if there are patterns then we'll try to codify those in docs.
We're open to having support for Harness to trigger the LifecycleEvents (pre-commit, commit). However, we again recognise that Scenario already provides this for anyone that needs it, and we feel that these are not widely used, so it's not something we should focus on at the moment. If anyone wants this, it should be a new ticket, since it's separate from (although related to) "charm initialisation in Harness".

canonical / operator

Charm should be reinitialized at every hook execution in Harness #736

Analysis

sdcore-upf-k8s

Issues

Charm tests sometimes do the `emit()` themselves

Some tests expect to be able to assert on the charm after the event

Some tests make changes to the charm object

Potential Solutions

Scenario

Harness test style

Detecting access during test execution

Static checking

Opt-in

Other Notes

`init` behaves differently per event

Harness doesn't fire framework events

Deferred events do reuse the charm

Custom events also reuse the charm

Lost context

Recommendations

canonical / operator

Charm should be reinitialized at every hook execution in Harness #736

Analysis

sdcore-upf-k8s

Issues

Charm tests sometimes do the emit() themselves

Some tests expect to be able to assert on the charm after the event

Some tests make changes to the charm object

Potential Solutions

Scenario

Harness test style

Detecting access during test execution

Static checking

Opt-in

Other Notes

__init__ behaves differently per event

Harness doesn't fire framework events

Deferred events do reuse the charm

Custom events also reuse the charm

Lost context

Recommendations

Charm tests sometimes do the `emit()` themselves

`init` behaves differently per event