janmuennich commented 11 months ago

Required information

Distribution: Ubuntu
Distribution version: 22.04
The output of "lxc info" or if that fails:
- Kernel version: 5.4.0-166-generic
- LXC version: 5.0.2
- LXD version: 5.0.2
- Storage backend in use: Ceph

Issue description

For a routine reboot of a cluster member, all containers on that member were moved to another cluster member. After the reboot, the server tried to start all containers in the cluster alphabetically, even though they were already running on other cluster members. After starting about 10 containers, LXD had a panic.

After another reboot, everything was running fine again with no attempt to start any containers.

The issue is critical, since the Ceph volumes were double-mounted which resulted in corrupted file systems (although I was able to restore recent snapshots).

How could this happen?

If LXD fails to establish connection to the cluster, it should not start any containers without a quorum.
Why would LXD start all containers anyway and not only the ones that were running on the server before?
Why does LXD not use the Ceph exclusive lock?

LXD didn't log anything relevant in lxd.log. Output from syslog:

Nov  4 23:43:25 systemd[1]: Listening on Socket unix for snap application lxd.daemon.
Nov  4 23:43:27 systemd[1]: Started Service for snap application lxd.daemon.
Nov  4 23:43:27 lxd.daemon[2797]: => Preparing the system (24322)
Nov  4 23:43:27 lxd.daemon[2797]: ==> Loading snap configuration
Nov  4 23:43:27 lxd.daemon[2797]: ==> Setting up mntns symlink (mnt:[4026533550])
Nov  4 23:43:27 lxd.daemon[2797]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Nov  4 23:43:27 lxd.daemon[2797]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Nov  4 23:43:27 lxd.daemon[2797]: ==> Setting up persistent shmounts path
Nov  4 23:43:27 lxd.daemon[2797]: ====> Making LXD shmounts use the persistent path
Nov  4 23:43:27 lxd.daemon[2797]: ====> Making LXCFS use the persistent path
Nov  4 23:43:27 lxd.daemon[2797]: ==> Setting up kmod wrapper
Nov  4 23:43:27 lxd.daemon[2797]: ==> Preparing /boot
Nov  4 23:43:27 lxd.daemon[2797]: ==> Preparing a clean copy of /run
Nov  4 23:43:27 lxd.daemon[2797]: ==> Preparing /run/bin
Nov  4 23:43:27 lxd.daemon[2797]: ==> Preparing a clean copy of /etc
Nov  4 23:43:28 lxd.daemon[2797]: ==> Preparing a clean copy of /usr/share/misc
Nov  4 23:43:28 lxd.daemon[2797]: ==> Setting up ceph configuration
Nov  4 23:43:28 lxd.daemon[2797]: ==> Setting up LVM configuration
Nov  4 23:43:28 lxd.daemon[2797]: ==> Setting up OVN configuration
Nov  4 23:43:28 lxd.daemon[2797]: ==> Rotating logs
Nov  4 23:43:28 lxd.daemon[2797]: ==> Setting up ZFS (0.8)
Nov  4 23:43:28 lxd.daemon[2797]: ==> Escaping the systemd cgroups
Nov  4 23:43:28 lxd.daemon[2797]: ====> Detected cgroup V1
Nov  4 23:43:28 lxd.daemon[2797]: ==> Escaping the systemd process resource limits
Nov  4 23:43:28 lxd.daemon[2797]: ==> Increasing the number of keys for a nonroot user
Nov  4 23:43:28 lxd.daemon[2797]: ==> Increasing the number of bytes for a nonroot user
Nov  4 23:43:28 lxd.daemon[2797]: ==> Disabling shiftfs on this kernel (auto)
Nov  4 23:43:28 lxd.daemon[2797]: => Starting LXCFS
Nov  4 23:43:28 lxd.daemon[2956]: Running constructor lxcfs_init to reload liblxcfs
Nov  4 23:43:28 lxd.daemon[2956]: mount namespace: 4
Nov  4 23:43:28 lxd.daemon[2956]: hierarchies:
Nov  4 23:43:28 lxd.daemon[2956]:   0: fd:   5:
Nov  4 23:43:28 lxd.daemon[2956]:   1: fd:   6: name=systemd
Nov  4 23:43:28 lxd.daemon[2956]:   2: fd:   7: net_cls,net_prio
Nov  4 23:43:28 lxd.daemon[2956]:   3: fd:   8: cpuset
Nov  4 23:43:28 lxd.daemon[2956]:   4: fd:   9: perf_event
Nov  4 23:43:28 lxd.daemon[2956]:   5: fd:  10: memory
Nov  4 23:43:28 lxd.daemon[2956]:   6: fd:  11: cpu,cpuacct
Nov  4 23:43:28 lxd.daemon[2956]:   7: fd:  12: freezer
Nov  4 23:43:28 lxd.daemon[2956]:   8: fd:  13: rdma
Nov  4 23:43:28 lxd.daemon[2956]:   9: fd:  14: devices
Nov  4 23:43:28 lxd.daemon[2956]:  10: fd:  15: hugetlb
Nov  4 23:43:28 lxd.daemon[2956]:  11: fd:  16: pids
Nov  4 23:43:28 lxd.daemon[2956]:  12: fd:  17: blkio
Nov  4 23:43:28 lxd.daemon[2956]: Kernel supports pidfds
Nov  4 23:43:28 lxd.daemon[2956]: Kernel supports swap accounting
Nov  4 23:43:28 lxd.daemon[2956]: api_extensions:
Nov  4 23:43:28 lxd.daemon[2956]: - cgroups
Nov  4 23:43:28 lxd.daemon[2956]: - sys_cpu_online
Nov  4 23:43:28 lxd.daemon[2956]: - proc_cpuinfo
Nov  4 23:43:28 lxd.daemon[2956]: - proc_diskstats
Nov  4 23:43:28 lxd.daemon[2956]: - proc_loadavg
Nov  4 23:43:28 lxd.daemon[2956]: - proc_meminfo
Nov  4 23:43:28 lxd.daemon[2956]: - proc_stat
Nov  4 23:43:28 lxd.daemon[2956]: - proc_swaps
Nov  4 23:43:28 lxd.daemon[2956]: - proc_uptime
Nov  4 23:43:28 lxd.daemon[2956]: - proc_slabinfo
Nov  4 23:43:28 lxd.daemon[2956]: - shared_pidns
Nov  4 23:43:28 lxd.daemon[2956]: - cpuview_daemon
Nov  4 23:43:28 lxd.daemon[2956]: - loadavg_daemon
Nov  4 23:43:28 lxd.daemon[2956]: - pidfds
Nov  4 23:43:29 lxd.daemon[2797]: => Starting LXD
Nov  4 23:43:29 lxd.daemon[3021]: time="2023-11-04T23:43:29Z" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
Nov  4 23:43:30 lxd.daemon[3021]: time="2023-11-04T23:43:30Z" level=warning msg="Dqlite: attempt 1: server 10.0.4.32:8443: no known leader"
Nov  4 23:43:30 lxd.daemon[3021]: time="2023-11-04T23:43:30Z" level=warning msg="Dqlite: attempt 1: server 10.0.4.33:8443: no known leader"
Nov  4 23:43:44 lxd.daemon[3021]: time="2023-11-04T23:43:44Z" level=error msg="Failed mounting storage pool" err="Failed to run: rbd --id admin --cluster ceph --pool pool info lxd_pool: signal: killed" pool=pool
Nov  4 23:43:44 lxd.daemon[3021]: time="2023-11-04T23:43:44Z" level=warning msg="Failed to initialize fanotify, falling back on inotify" err="Failed to initialize fanotify: invalid argument"
Nov  4 23:43:45 lxd.daemon[2797]: => LXD is ready
Nov  4 23:45:08 lxd.daemon[3021]: time="2023-11-04T23:45:08Z" level=warning msg="Failed auto start instance attempt" attempt=1 err="Unable to resolve container rootfs: lstat /var/snap/lxd/common/lxd/containers/xxx: no such file or directory" instance=xxx maxAttempts=3 project=default
Nov  4 23:45:14 lxd.daemon[3021]: time="2023-11-04T23:45:14Z" level=warning msg="Failed auto start instance attempt" attempt=2 err="Unable to resolve container rootfs: lstat /var/snap/lxd/common/lxd/containers/xxx: no such file or directory" instance=xxx maxAttempts=3 project=default
Nov  4 23:45:20 lxd.daemon[3021]: time="2023-11-04T23:45:20Z" level=warning msg="Failed auto start instance attempt" attempt=3 err="Unable to resolve container rootfs: lstat /var/snap/lxd/common/lxd/containers/xxx: no such file or directory" instance=xxx maxAttempts=3 project=default
Nov  4 23:45:20 lxd.daemon[3021]: time="2023-11-04T23:45:20Z" level=error msg="Failed to auto start instance" err="Unable to resolve container rootfs: lstat /var/snap/lxd/common/lxd/containers/xxx: no such file or directory" instance=xxx project=default
Nov  4 23:45:21 lxd.daemon[3021]: panic: runtime error: invalid memory address or nil pointer dereference
Nov  4 23:45:21 lxd.daemon[3021]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xd83ae5]
Nov  4 23:45:21 lxd.daemon[3021]: goroutine 494 [running]:
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/bgp.(*Server).RemovePrefixByOwner(0x0, {0xc0002190a0, 0x10})
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/bgp/server.go:322 +0x65
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/device.bgpRemovePrefix(0xc000cac660, 0xc000014038?)
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/device/device_utils_network.go:581 +0xe5
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/device.(*nicBridged).Stop(0xc000cac660)
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/device/nic_bridged.go:788 +0x3a
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/instance/drivers.(*lxc).deviceStop(0xc000b80d80, {0x7f0c685491d8, 0xc000cac660}, 0x0, {0x0, 0x0})
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_lxc.go:1501 +0x1c5
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/instance/drivers.(*lxc).startCommon.func2()
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_lxc.go:1984 +0x3f
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/revert.(*Reverter).Fail(0xc000bc5310)
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/revert/revert.go:29 +0x35
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/instance/drivers.(*lxc).startCommon(0xc000b80d80)
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_lxc.go:2014 +0x21d8
Nov  4 23:45:21 lxd.daemon[3021]: github.com/lxc/lxd/lxd/instance/drivers.(*lxc).Start(0xc000b80d80, 0x0)
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/instance/drivers/driver_lxc.go:2301 +0x473
Nov  4 23:45:21 lxd.daemon[3021]: main.instancesStart(0xc0013200b0, {0xc000780400?, 0x3e, 0x0?})
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/instances.go:263 +0x9dc
Nov  4 23:45:21 lxd.daemon[3021]: main.storageStartup.func2()
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/storage.go:171 +0x2a5
Nov  4 23:45:21 lxd.daemon[3021]: created by main.storageStartup
Nov  4 23:45:21 lxd.daemon[3021]: #011/build/lxd/parts/lxd/src/lxd/storage.go:138 +0x2ef
Nov  4 23:45:21 lxd.daemon[2797]: => LXD failed with return code 2
Nov  4 23:45:21 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Nov  4 23:45:21 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Nov  4 23:45:21 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 1.
Nov  4 23:45:21 systemd[1]: Stopped Service for snap application lxd.daemon.

tomponline commented 10 months ago

@simondeziel please can you see if this is still occurring on 5.0/edge?

tomponline commented 9 months ago

@simondeziel did you get chance to test this one yet?

tomponline commented 9 months ago

If its an issue I'd like to get the fix into LXD 5.0.3

simondeziel commented 9 months ago

@simondeziel did you get chance to test this one yet?

I'll get back to you by EOD.

simondeziel commented 9 months ago

I'm afraid this will require some more time as I first need to get a testing environment using Ceph as storage backend.

simondeziel commented 8 months ago

@tomponline using 5.0/stable (!= 5.0/edge) with the same snap revision (24322) I could not reproduce the issue.

@janmuennich here's what I did to try and reproduce your issue:

I setup a physical machine, created 3 VMs with 8 cores, 40G of RAM and an extra disk of 75G. I setup microceph to use the extra disk in each VMs then clustered LXD. From there, I created 3 instances that were automatically spread between cluster members. I then did a rolling reboot by first lxc cluster evacuate, then rebooting then lxc cluster restore it back.

LXD moved the instances off and the node being evacuated and back when restored.

Failing to reproduce the issue I then tried a more aggressive scenario when I'd kill one of the cluster member cause its instance to go into ERROR state. I then lxc move --target the instance to one of the 2 remaining cluster node and started it. After that, I started the killed cluster member and even then, it behaved fine and didn't attempt to start the instance that was migrated off of it while it was dead.

janmuennich commented 8 months ago

Thanks for trying to reproduce the issue!

Just to clarify, I didn't use lxc cluster evacuate but used our own script that moves all instances on a machine with lxc move to another spare one. So the affected machine was completely empty of any instances with an up-to-date database.

This is the procedure we use regularly and the issue occured only once that time. So I guess it's not easy to reproduce :(

simondeziel commented 8 months ago

@janmuennich you replied just before I tore down my test env so I was able to test without the lxc cluster evacuate but sure enough, my few attempts didn't reproduce the issue :/

janmuennich commented 8 months ago

It probably makes sense then to close this issue for now until someone else encounters the same. We set lxc profile set default boot.autostart false though to prevent this from happening again in production.

yo004192 commented 8 months ago

I may have also met this issue in lxd 5.11. What I do is: One node remove snap and lxd because of my misoperation. After I reinstall them, this node try to start all instances in the cluster. After this time, many instances' rootfs broken (including which is running before). I just checked my lxc config, but boot.autostart is not set. About 1/5 rootfs(ext4) of my instances is broken. However, I don't know whether it is the root cause of rootfs broken, just suspect it.

simondeziel commented 2 months ago

We'll close this issue as we were unsuccessful in reproducing it. If someone has a reliable reproducer or anything that could help debug it, please re-open or create a new one. Thanks.

canonical / lxd

Cluster member tries to autostart all containers after reboot #12517

Required information

Issue description