canonical / github-runner-operator

github-runner-operator - charm repository.
Apache License 2.0
9 stars 20 forks source link

VMs fail to start #25

Open vmpjdc opened 1 year ago

vmpjdc commented 1 year ago

I'm using revision 4 of this charm from edge, and VMs are failing to start.

2022-12-07 20:56:36 ERROR unit.xlarge/0.juju-log server.go:319 Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-xlarge-0/charm/src/runner.py", line 235, in create
    self._start_instance(instance)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/src/runner.py", line 468, in _start_instance
    instance.start(wait=True)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/instance.py", line 363, in start
    return self._set_state("start", timeout=timeout, force=force, wait=wait)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/instance.py", line 348, in _set_state
    self.client.operations.wait_for_operation(response.json()["operation"])
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/operation.py", line 57, in wait_for_operation
    operation.wait()
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/operation.py", line 98, in wait
    raise exceptions.LXDAPIException(response)
pylxd.exceptions.LXDAPIException: Failed to run: forklimits limit=memlock:unlimited:unlimited -- /snap/lxd/23991/bin/qemu-system-x86_64 -S -name xlarge-z3vxfu -uuid c994191b-f9f8-44f8-b3c9-0c19f72c4d76 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/xlarge-z3vxfu/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/xlarge-z3vxfu/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/xlarge-z3vxfu/qemu.pid -D /var/snap/lxd/common/lxd/logs/xlarge-z3vxfu/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero value -1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 293, in <module>
    main(GithubRunnerOperator)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/ops/main.py", line 429, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/ops/framework.py", line 753, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/ops/framework.py", line 790, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 88, in _on_install
    self._reconcile_runners(runner_manager)
  File "./src/charm.py", line 247, in _reconcile_runners
    delta_virtual_machines = runner_manager.reconcile(
  File "/var/lib/juju/agents/unit-xlarge-0/charm/src/runner.py", line 199, in reconcile
    self.create(image="ubuntu", virt=virt_type, vm_resources=vm_resources)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/src/runner.py", line 250, in create
    instance.stop(wait=True)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/instance.py", line 367, in stop
    return self._set_state("stop", timeout=timeout, force=force, wait=wait)
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/instance.py", line 348, in _set_state
    self.client.operations.wait_for_operation(response.json()["operation"])
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/operation.py", line 57, in wait_for_operation
    operation.wait()
  File "/var/lib/juju/agents/unit-xlarge-0/charm/venv/pylxd/models/operation.py", line 98, in wait
    raise exceptions.LXDAPIException(response)
pylxd.exceptions.LXDAPIException: The instance is already stopped

LXD 4.0.9-eb5e237 is installed, which seems quite old (although still maintained). Nothing useful is logged:

ubuntu@juju-e4d256-prod-github-runner-9:~$ sudo lxc start xlarge-6hdwni
Error: Failed to run: forklimits limit=memlock:unlimited:unlimited -- /snap/lxd/23991/bin/qemu-system-x86_64 -S -name xlarge-6hdwni -uuid dd306a5b-6ace-4971-9519-f020ef7c989e -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/xlarge-6hdwni/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/xlarge-6hdwni/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/xlarge-6hdwni/qemu.pid -D /var/snap/lxd/common/lxd/logs/xlarge-6hdwni/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero value -1
Try `lxc info --show-log xlarge-6hdwni` for more info
ubuntu@juju-e4d256-prod-github-runner-9:~$ sudo lxc info --show-log xlarge-6hdwni
Name: xlarge-6hdwni
Location: none
Remote: unix://
Architecture: x86_64
Created: 2022/12/07 21:18 UTC
Status: Stopped
Type: virtual-machine (ephemeral)
Profiles: default, runner, xlarge-6hdwni

Log:

ubuntu@juju-e4d256-prod-github-runner-9:~$ _

I switched to LXD 5.8 (--channel latest) and that works.

jdkandersson commented 1 year ago

We just need to change the snap install command to ensure the latest lxd version is installed

jdkandersson commented 1 year ago

Reference for setting up lxd: https://github.com/charmed-kubernetes/actions-operator/blob/ced11ec027a9849a9d8b8b635905b24880dd4ee0/src/bootstrap/index.ts#L141-L156