Closed phvalguima closed 1 month ago
I believe we should check if either state is in either UPGRADING
or HEALTHY
state.
Likewise, we have another point where that is a problem here
I don't think this is a bug
the highest unit should have upgraded & be healthy before the upgrade is resumed (without force)
for history, conclusion: issue (reason why resume-upgrade failed) was
unit-failover-1: 12:39:13 INFO unit.failover/1.juju-log Current health of cluster: ignore
unit-failover-1: 12:39:13 ERROR unit.failover/1.juju-log Cluster is not healthy after upgrade. Manual intervention required. To rollback, `juju refresh` to the previous revision
and cluster health (checked here: https://github.com/canonical/opensearch-operator/blob/6670d19650144de9b08d549554b4cb51bbb3c1f0/lib/charms/opensearch/v0/opensearch_base_charm.py#L985) should not have returned ignore
As discussed with @carlcsaposs-canonical the issue was on the self.health.apply
and moving to self.health.get
.
The
resume-upgrade
fails with:If the leader unit is running on the unit with the highest identifier.
Using pdb, I can confirm the following, on:
The charm will fail as
state
reports:Full Status: