canonical / charmcraft

Collaborate, build and publish charmed operators for Kubernetes, Linux and Windows.
Apache License 2.0
66 stars 72 forks source link

Packing multiple charms (?) on CI fails with "Instance is busy" message #1138

Closed DnPlas closed 1 year ago

DnPlas commented 1 year ago

Bug Description

The Charmed Kubeflow team has a repository with 7 charms (canonical/kfp-operators), in which the CI runs to build and deploy the seven of them. Having this many charms may be causing an issue as we are getting the following error when calling built_charms = await ops_test.build_charms(*charms_to_build.values()):

RuntimeError: Failed to build charm /home/runner/work/kfp-operators/kfp-operators/charms/kfp-schedwf:
Packing the charm.
Launching environment to pack for base name='ubuntu' channel='20.04' architectures=['amd64'] (may take a while the first time but it's reusable)
charmcraft internal error: LXDError(brief="Failed to copy instance 'local:charmcraft-kfp-schedwf-1070911-0-0-amd64' to 'local:base-instance-charmcraft-buildd-base-v10-07bde40a62a827cdc5a4'.", details='* Command that failed: \'lxc --project charmcraft copy local:charmcraft-kfp-schedwf-1070911-0-0-amd64 local:base-instance-charmcraft-buildd-base-v10-07bde40a62a827cdc5a4\'\n* Command exit code: 1\n* Command standard error output: b\'Error: Failed creating instance record: Instance is busy running a "create" operation\\n\'', resolution=None)
Full execution log: '/home/runner/.local/state/charmcraft/log/charmcraft-20230704-165315.306636.log'

For more details go here

It sometimes fails with:

charmcraft internal error: LXDError(brief="Failed to create project 'charmcraft'.", details='* Command that failed: \'lxc --project charmcraft project create local:charmcraft\'\n* Command exit code: 1\n* Command standard error output: b\'Error: Failed creating project "charmcraft": Failed adding database record: This "projects" entry already exists\\n\'', resolution=None)
Full execution log: '/home/runner/.local/state/charmcraft/log/charmcraft-20230704-172534.136770.log'

Same as #1132

For more details go here

Workaround

For me using destructive-mode works, of course, but it is far from ideal.

To Reproduce

I haven't been able to reproduce in my local dev machine, but happens on GH runners:

  1. Clone https://github.com/canonical/kfp-operators/tree/main
  2. Apply this patch

    diff --git a/.github/workflows/integrate.yaml b/.github/workflows/integrate.yaml
    index 9d8b7fe..704ad15 100644
    --- a/.github/workflows/integrate.yaml
    +++ b/.github/workflows/integrate.yaml
    @@ -133,7 +133,7 @@ jobs:
           # Remove destructive-mode once these bugs are fixed:
           # https://github.com/canonical/charmcraft/issues/554
           # https://github.com/canonical/craft-providers/issues/96
    -          tox -e bundle-integration -- --model kubeflow --destructive-mode
    +          tox -e bundle-integration -- --model kubeflow
    
       - name: Get all
         run: kubectl get all -A
  3. Submit a PR and observe the "Run Tests / Test the bundle" job fail

Environment

charmcraft.yaml

N/A

Relevant log output

=================================== FAILURES ===================================
____________________________ test_build_and_deploy _____________________________
Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_hooks.py", line 433, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_manager.py", line 112, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_callers.py", line 155, in _multicall
    return outcome.get_result()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_result.py", line 108, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_callers.py", line 80, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/_pytest/runner.py", line 177, in pytest_runtest_call
    raise e
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/_pytest/python.py", line 1788, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_hooks.py", line 433, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_manager.py", line 112, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_callers.py", line 155, in _multicall
    return outcome.get_result()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_result.py", line 108, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pluggy/_callers.py", line 80, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pytest_asyncio/plugin.py", line 525, in inner
    _loop.run_until_complete(task)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/runner/work/kfp-operators/kfp-operators/tests/integration/test_kfp_generic_installation.py", line 200, in test_build_and_deploy
    built_charms = await ops_test.build_charms(*charms_to_build.values())
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pytest_operator/plugin.py", line 1028, in build_charms
    charms = await asyncio.gather(
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration/lib/python3.8/site-packages/pytest_operator/plugin.py", line 1011, in build_charm
    raise RuntimeError(
RuntimeError: Failed to build charm /home/runner/work/kfp-operators/kfp-operators/charms/kfp-api:
Packing the charm.
Launching environment to pack for base name='ubuntu' channel='20.04' architectures=['amd64'] (may take a while the first time but it's reusable)
charmcraft internal error: LXDError(brief="Failed to create project 'charmcraft'.", details='* Command that failed: \'lxc --project charmcraft project create local:charmcraft\'\n* Command exit code: 1\n* Command standard error output: b\'Error: Failed creating project "charmcraft": Failed adding database record: This "projects" entry already exists\\n\'', resolution=None)
Full execution log: '/home/runner/.local/state/charmcraft/log/charmcraft-20230704-145156.175803.log'
lengau commented 1 year ago

I'm not sure this is necessarily intended behaviour, but I can at least tell you what's causing it and how to work around it ITMT — it's occurring because multiple charmcraft instances are being run in parallel and started nearly simultaneously, leading to a race condition in craft-providers.

In the log output above, the race is occurring when multiple instances of charmcraft try to create the charmcraft project simultaneously (running LXC.project_create). You should be able to work around that by creating an lxc project called charmcraft before trying to run any charmcraft instances.

However, from the linked action step, it's failing at a different point — specifically, when creating a base instance.

To get your CI working without destructive mode, the most stable workaround is probably to run only one instance of charmcraft at a time. If the charms all build on the same base (which it looks like they do), you might be able to build one charm on its own and then build the rest simultaneously.

lengau commented 1 year ago

I'm going to mark this as a duplicate of #1132. Once the issue is solved in craft-providers we can release a charmcraft update that resolves it.

sergiusens commented 1 year ago

This was not solved by #1132

sergiusens commented 1 year ago

We will create a test reproducer for this first to run in Spread and verify the failure, then updated craft providers with the fix from https://github.com/canonical/craft-providers/issues/341 and thus fix the issue