canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 931 forks source link

Upgrading from 4.0 to 5.21 fails on patch `storage_prefix_bucket_names_with_project` #13860

Closed basak closed 1 month ago

basak commented 2 months ago

Required information

Name    Version         Rev    Tracking       Publisher   Notes
core20  20240416        2318   latest/stable  canonical✓  base
core22  20240408        1380   latest/stable  canonical✓  base
lxd     4.0.9-a29c6f1   24061  5.21/stable    canonical✓  disabled
lxd     5.21.2-34459c8  29568  5.21/stable    canonical✓  -
snapd   2.63            21759  latest/stable  canonical✓  snapd

Issue description

Following on from #13806 I tried updating the snap to 5.21/stable. This caused lxc to fail entirely.

Steps to reproduce

  1. Launch and log in to a Focal VM over ssh. I used: uvt-kvm create --memory=1024 rbasak-lxd release=focal arch=amd64 with this image: release=focal arch=amd64 label=release (20240710), then uvt-kvm wait rbasak-lxd, then uvt-kvm ssh rbasak-lxd.
  2. sudo lxd init, specifying all defaults except that I used the "dir" storage type.
  3. lxc remote rm images
  4. lxc remote add images https://images.lxd.canonical.com --protocol=simplestreams
  5. lxc launch images:debian/sid/amd64 foo
  6. sudo snap refresh --channel=5.21/stable lxd
  7. lxc list

Expected results: listing of the one container I created previously.

Actual results:

Error: LXD unix socket not accessible: Get "http://unix.socket/1.0": EOF

Logging out of the VM and back in again, and trying lxc list again, I get:

Error: LXD unix socket "/var/snap/lxd/common/lxd/unix.socket" not accessible: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

Rebooting the VM doesn't help. After that, I get:

Error: LXD unix socket not accessible: Get "http://unix.socket/1.0": EOF

tomponline commented 2 months ago

Hi @basak

What does cat /var/snap/lxd/common/lxd/logs/lxd.log show?

Does it show something like this?

time="2024-08-02T12:54:23Z" level=error msg="Failed to start the daemon" err="Failed applying patch \"storage_prefix_bucket_names_with_project\": Failed applying patch to pool \"default\": Failed to list directory \"/var/snap/lxd/common/lxd/storage-pools/default/buckets\" for volume type \"buckets\": open /var/snap/lxd/common/lxd/storage-pools/default/buckets: no such file or directory"
tomponline commented 2 months ago

@basak FWIW I removed LXD using snap lxd --purge and installed LXD from 5.21/stable and latest/stable and the sid container started but had the issue that the systemd inside it did not start properly:

root@v1:~# lxc ls
+------+---------+------+-----------------------------------------------+-----------+-----------+
| NAME |  STATE  | IPV4 |                     IPV6                      |   TYPE    | SNAPSHOTS |
+------+---------+------+-----------------------------------------------+-----------+-----------+
| foo  | RUNNING |      | fd42:b778:3c14:c97a:216:3eff:fe47:9588 (eth0) | CONTAINER | 0         |
+------+---------+------+-----------------------------------------------+-----------+-----------+
root@v1:~# lxc shell foo
root@foo:~# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.7  19324  7648 ?        Ss   12:58   0:00 /sbin/init
root          30  0.0  0.3   9660  3668 pts/1    Ss   12:58   0:00 su -l
root          31  0.0  0.4   7260  4104 pts/1    S    12:58   0:00 -bash
root          35  0.0  0.3   8180  3624 pts/1    R+   12:58   0:00 ps aux

I suspect this is due to the sid container wanting cgroupv2 but Focal only providing cgroupv1. Is there a way to enable cgroupv2 in Focal do you know?

tomponline commented 2 months ago

BTW steps 3 and 4 arent needed anymore as 4.0.10 includes the new remote by default.

tomponline commented 2 months ago

@basak re Debian Sid on Focal, I suspect the issue is the same as for Ubuntu Oracular, the use of systemd v256 which removes cgroupv1 support, see https://github.com/canonical/lxd/issues/13844#issuecomment-2268632337 for more info.

MggMuggins commented 2 months ago

It looks like this commit, which adds the buckets directory for storage pools was backported to 5.0 but not 4.0 (introduced in 5.0.2). I would expect that upgrades from pre-5.0.2 are affected by this as well.

Still meandering through the dir storage driver for a solution here; more tomorrow.

tomponline commented 1 month ago

Fixed by https://github.com/canonical/lxd/pull/13957 will backport into 5.21