Hi team, just came across this issue in one of our opensearch runs:
1) test_network cut is finishing its thing
2) Juju tries to reach to the instance cut off right before the test restores the network:
2024-09-12T22:52:07.3489453Z machine-1: 22:51:43 [91mERROR[0m juju.worker.dependency "api-caller" manifold worker returned unexpected error: [35e1b8] "machine-1" cannot open api: unable to connect to API: dial tcp 10.114.131.230:17070: connect: no route to host
3) Network restored
4) Unit set to failure right after, because of the failed trial to run update-status hook:
Let's have a wait after each restore network cut call with tenacity:
def restore_network_cut(...):
....
try_wait_model_settles(wait_list)
@tenacity.wait(...)
def try_wait_model_settles(wait_list):
# Wait for the list of units, if not specified, wait for the entire model to settle
wait_for_idle(...)
Hi team, just came across this issue in one of our opensearch runs:
1) test_network cut is finishing its thing 2) Juju tries to reach to the instance cut off right before the test restores the network:
3) Network restored 4) Unit set to failure right after, because of the failed trial to run
update-status
hook:5)
wait_for_idle
fails as we have a unit in errorFull CI run: https://github.com/canonical/opensearch-dashboards-operator/actions/runs/10831485374/job/30079586227
Proposal
Let's have a wait after each
restore network cut
call with tenacity: