canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 928 forks source link

lxc failing with `Error: mkdir /var/snap/lxd/common/lxd/shmounts: file exists` when using snap in parallel mode #12168

Open jameinel opened 1 year ago

jameinel commented 1 year ago

Required information

Issue description

Lxc is failing to start containers. I first noticed this while trying to do a juju_29 bootstrap lxd lxd after doing a parallel install of the juju snap. However, the actual failure is happening with only lxc launch in the mix:

Steps to reproduce

  1. Try to launch an LXD container:
    $ lxc launch juju/ubuntu@20.04/amd64
    Creating the instance
    Instance name is: proven-mule
    Starting proven-mule
    Error: mkdir /var/snap/lxd/common/lxd/shmounts: file exists
    Try `lxc info --show-log local:proven-mule` for more info
  2. Looking at the contents of the directory that path does exist:

    $ ll /var/snap/lxd/common/lxd/shmounts
    lrwxrwxrwx 1 root root 39 May 10 11:15 /var/snap/lxd/common/lxd/shmounts -> /var/snap/lxd/common/shmounts/instances

    However, what it points to, does not:

    $ sudo ls -al /var/snap/lxd/common/shmounts
    total 8
    drwx--x--x 2 root root 4096 Jan 20 16:00 .
    drwxr-xr-x 9 root root 4096 May 10 11:15 ..

I can manually delete the symlink, or manually create the instances directory, but I'm not sure what perms should be used. I don't know whether Juju is somehow using an older lxd client library version that set something up incorrectly (but juju the snap shouldn't have any writes to write into those directories anyway, so I'm pretty sure it is LXD the agent who is setting those things up.)

Information to attach

There are only 2 lines in /var/snap/lxd/common/lxd/logs/lxd.log:

time="2023-05-10T11:15:35-04:00" level=warning msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
time="2023-05-10T11:15:35-04:00" level=warning msg="Instance type not operational" driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
tomponline commented 1 year ago

Does this occur on all (or a fresh) system? Or just this particular machine?

stgraber commented 1 year ago

Moved over to the snap packaging repo

jameinel commented 1 year ago

It happened for 2 relatively fresh systems in my testing.

jameinel commented 1 year ago

It seems that the factor is installing LXD, and then enabling parallel installs (https://snapcraft.io/docs/parallel-installs), and then trying to launch a container. Vitaly should have a bit more information here.

SimonRichardson commented 1 year ago

I've also run into this one.

For everyone that does run into this, I can't get parallel-instances to work correctly for now. So disabling is the only option I had. Then I had to restart lxd to get it working again.

$ sudo snap set system experimental.parallel-instances=false
$ sudo snap restart lxd
bboozzoo commented 2 months ago

I can't seem to reproduce any problem with launching the containers. I've set up a VMs with 22.04 and 24.04, lxd 5.0.3 and 5.21 respectively. Parallel instances enabled, installed test-snapd-sh-core24 and test-snapd-sh-core24_foo in both VMs and launched both to have the proper mounts set up. Then i launched a couple of containers, launched containers within the containers, removed them, no issues. There's a chance this may have been fixed by https://github.com/canonical/lxd-pkg-snap/pull/375 and https://github.com/canonical/lxd-pkg-snap/pull/379

tomponline commented 2 months ago

@jameinel @bboozzoo happy to close this one?

bboozzoo commented 2 months ago

SGTM, if @jameinel agrees then let's close it. If the problem shows up again, feel free to file a bug for snapd. to investigate and we can take it from there.

JoseFMP commented 1 month ago

For me the same issue as described in the original issue description is still happening after enabling parallel installs.

LXD 5.21.1 LTS, Ubuntu 24.04, snap 2.63+24.04