Closed ajkavanagh closed 3 years ago
The interface layer is using @when
conditions to set or clear the cluster.available
flag. Because the order of invocation of @when
handlers isn't defined, this can lead to a race condition where the charm's handler gets invoked before the interface layer has a chance to clear the flag. The safest way to fix this is to switch to Endpoint.manage_flags()
which is guaranteed to allow the interface to set up all flags prior to any charm handlers running. It should also simplify the flag management logic in the layer.
It's not really about the race. It's that the handler still gets called despite the condition not being honored as decorated on that handler. i.e. the framework isn't doing what it says it is doing. It is calling a handler when a condition isn't true. That's a bug, I think. Solving it may not necessarily be trivial as I expect that it needs to re-evaluate the conditions just before a handler is called to verify that it should still be called.
The alternative is to say: "actually, it's a reasonable endeavour, but the caller of endpoint_from_flag()
must check the return value to ensure that it is still valid at that point in time". That's a slightly different contract, but needs to be made clear, I think.
It looks like the handler invoked scale_out
which has the following sig/conditions:
@reactive.when('endpoint.cluster.changed.unit-configure-ready')
@reactive.when('leadership.set.cluster-instances-clustered')
@reactive.when('leadership.is_leader')
def scale_out():
"""Handle scale-out adding new nodes to an existing cluster."""
...
I.e. there is no check on the flag. So, moral of the story is probably that we always check endpoint_from_flag()
as the handler may not have been checked. I think the behaviour has been seen, but this isn't it. Sorry for the incorrect bug report; we'll keep an eye out for this in the future.
Essentially, a handler gets called despite not matching the conditions on the handler. An example:
From mysql-innodb-cluster (and resulting in bug https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/1896809):
Note that
all_joined_units
property is being accessed on thecluster
obtained from theendpoint_from_flag()
call.The stack-trace:
i.e. all joined units is
None
. I don't think that this should be possible as the handler shouldn't be able to be invoked unless the@when('cluster.available)
is truth-y.