canonical / opensearch-operator

OpenSearch operator
Apache License 2.0
12 stars 7 forks source link

ha/test_ha_networking.py fails to run lxc command since lxd 5.21 #227

Closed carlcsaposs-canonical closed 7 months ago

carlcsaposs-canonical commented 7 months ago

test passed on 2024-04-11 with lxd 5.20-f3dd836 snap rev 27049 https://github.com/canonical/opensearch-operator/actions/runs/8640068334/job/23687547198 test failed on 2024-04-12 with lxd 5.21.1-43998c6 snap rev 28155 https://github.com/canonical/opensearch-operator/actions/runs/8655512868/job/23734702946#step:21:1686

 _________ test_full_network_cut_without_ip_change_node_with_elected_cm _________
Traceback (most recent call last):
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_callers.py", line 181, in _multicall
    return outcome.get_result()
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_result.py", line 99, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/_pytest/runner.py", line 177, in pytest_runtest_call
    raise e
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/_pytest/python.py", line 1792, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_callers.py", line 181, in _multicall
    return outcome.get_result()
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_result.py", line 99, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/home/runner/work/opensearch-operator/opensearch-operator/.tox/integration/lib/python3.10/site-packages/pytest_asyncio/plugin.py", line 532, in inner
    _loop.run_until_complete(task)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/opensearch-operator/opensearch-operator/tests/integration/ha/test_ha_networking.py", line 330, in test_full_network_cut_without_ip_change_node_with_elected_cm
    await cut_network_from_unit_without_ip_change(ops_test, app, first_elected_cm_unit_id)
  File "/home/runner/work/opensearch-operator/opensearch-operator/tests/integration/ha/helpers.py", line 347, in cut_network_from_unit_without_ip_change
    subprocess.check_call(limit_set_command.split())
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['lxc', 'config', 'set', 'juju-fa995e-2', 'limits.network.priority=10']' returned non-zero exit status 1.

https://chat.canonical.com/canonical/pl/877qxrhr37d99n3qibhgdayyra

github-actions[bot] commented 7 months ago

https://warthogs.atlassian.net/browse/DPE-4055

phvalguima commented 7 months ago

This is the original deprecation notice: https://github.com/canonical/lxd/issues/12419

I can change the NIC's specific options in lxd 5.21 as follows:

lxc config device set juju-f89d82-0 eth0 limits.priority=0
phvalguima commented 7 months ago

I've tested the following commands in lxc, using 5.20 from latest/stable.

Script results: https://pastebin.ubuntu.com/p/dk2MstxbbK/

This test is not doing what is expected in 5.20 either.

Mehdi-Bendriss commented 7 months ago

@phvalguima Until to last week, the test was passing successfully, thus doing what it was expected to - example successful run on main with lxd (5.20/stable) 5.20-f3dd836

On this PR - where Carl pins LXD to 5.20 the test passes successfully with lxd (5.20/stable) 5.20-f3dd836

phvalguima commented 7 months ago

@Mehdi-Bendriss indeed the unit goes to lost for a period of time. Still, I think we should give a shot with lxc config device set juju-f89d82-0 eth0 limits.priority=0 before deciding to pin the version.

Mehdi-Bendriss commented 7 months ago

@phvalguima agreed. Which is exactly what this issue is about, to track and try/test the new fix you proposed, with LXD 5.21 - please make a PR with the fix, referencing this issue DPE-4055 But in order to not block other PRs, pinning LXD to the latest working version, is a sensible temporary solution.