Open phvalguima opened 8 months ago
In run: https://github.com/canonical/opensearch-operator/actions/runs/8032823682/job/21942692208
I can see the following status:
Model Controller Cloud/Region Version SLA Timestamp test-horizontal-scaling-ri5x github-pr-1b962-lxd localhost/localhost 3.1.7 unsupported 21:23:10Z App Version Status Scale Charm Channel Rev Exposed Message opensearch active 5 opensearch 0 no self-signed-certificates active 1 self-signed-certificates stable 72 no Unit Workload Agent Machine Public address Ports Message opensearch/3* active idle 4 10.217.138.76 opensearch/4 active idle 5 10.217.138.124 opensearch/6 active idle 7 10.217.138.64 opensearch/7 waiting idle 8 10.217.138.239 Awaiting service operation opensearch/8 active idle 9 10.217.138.160 self-signed-certificates/0* active idle 0 10.217.138.75 Machine State Address Inst id Base AZ Message 0 started 10.217.138.75 juju-29c99e-0 ubuntu@22.04 Running 4 started 10.217.138.76 juju-29c99e-4 ubuntu@22.04 Running 5 started 10.217.138.124 juju-29c99e-5 ubuntu@22.04 Running 7 started 10.217.138.64 juju-29c99e-7 ubuntu@22.04 Running 8 started 10.217.138.239 juju-29c99e-8 ubuntu@22.04 Running 9 started 10.217.138.160 juju-29c99e-9 ubuntu@22.04 Running Integration provider Requirer Interface Type Message opensearch:opensearch-peers opensearch:opensearch-peers opensearch_peers peer opensearch:service opensearch:service rolling_op peer self-signed-certificates:certificates opensearch:certificates tls-certificates regular Storage Unit Storage ID Type Pool Mountpoint Size Status Message opensearch/3 opensearch-data/3 filesystem rootfs /var/snap/opensearch/common 145 GiB attached opensearch/4 opensearch-data/4 filesystem rootfs /var/snap/opensearch/common 145 GiB attached opensearch/6 opensearch-data/6 filesystem rootfs /var/snap/opensearch/common 145 GiB attached opensearch/7 opensearch-data/7 filesystem rootfs /var/snap/opensearch/common 145 GiB attached opensearch/8 opensearch-data/8 filesystem rootfs /var/snap/opensearch/common 145 GiB attached
The unit opensearch/7 is never promoted to a leader: https://pastebin.ubuntu.com/p/wBgdY8m4Vz/
opensearch/7
However, I can see in its logs:
unit-opensearch-7: 2024-02-24 20:49:49 DEBUG unit.opensearch/7.juju-log service:0: Deferring <RunWithLock via OpenSearchOperatorCharm/on/service_run_with_lock[244]>. unit-opensearch-7: 2024-02-24 20:49:49 DEBUG unit.opensearch/7.juju-log service:0: Emitting custom event <ProcessLocks via OpenSearchOperatorCharm/on/service_process_locks[245]>.
That happens because this logic: https://github.com/canonical/charm-rolling-ops/blob/4bae5c031cb7a8d5fd3819ace1f1496c87c0aae4/lib/charms/rolling_ops/v0/rollingops.py#L408
Compares the lock created within RunWithLock, which is Lock(self).unit (own unit) - with self.model.unit. According to the docstring in the operator: https://github.com/canonical/operator/blob/1836df5affb42b3183125b1904c794090aa1862b/ops/model.py#L132
RunWithLock
Lock(self).unit
self.model.unit
The unit that is running this code
Therefore, that should always be true.
I consider this error a red-herring only because the process locks routine checks if this is a leader, right at its beginning. So, the only real issue is unneeded events in all the other units.
https://warthogs.atlassian.net/browse/DPE-3668
In run: https://github.com/canonical/opensearch-operator/actions/runs/8032823682/job/21942692208
I can see the following status:
The unit
opensearch/7
is never promoted to a leader: https://pastebin.ubuntu.com/p/wBgdY8m4Vz/However, I can see in its logs:
That happens because this logic: https://github.com/canonical/charm-rolling-ops/blob/4bae5c031cb7a8d5fd3819ace1f1496c87c0aae4/lib/charms/rolling_ops/v0/rollingops.py#L408
Compares the lock created within
RunWithLock
, which isLock(self).unit
(own unit) - withself.model.unit
. According to the docstring in the operator: https://github.com/canonical/operator/blob/1836df5affb42b3183125b1904c794090aa1862b/ops/model.py#L132Therefore, that should always be true.
I consider this error a red-herring only because the process locks routine checks if this is a leader, right at its beginning. So, the only real issue is unneeded events in all the other units.