Closed ykazakov closed 2 years ago
Limiting the number of operations is likely to be too problematic but we should certainly make sure that we only have one storage operation per volume at any one time.
So we should wait for any existing snapshot creation or deletion before starting a new one on a particular volume.
@tomponline we already have some kind of locking mechanism on volumes, right?
Yes, we use the github.com/lxc/lxd/lxd/locking
package, combined with OperationLockName
helper function for selecting a lock name.
Although in order to avoid locking up the expired snapshots task, or ending up with lots of go routines hanging around locked on a long running operation, perhaps we should extend the "locking" package to allow checking if a lock exists, in those cases, skip over the task for the particular volume, as it will then get picked up the next time (assuming the lock is released by then).
It would be also great if one can see a little bit more info in lxc operation show
about which exactly snapshots or volumes are being deleted.
@tomponline yeah, for taking snapshots, I don't think we should skip, but for the expiry we definitely can skip as they'll be handled later on.
A short update: btrfs check
did not show any significant problems (just some mismatches for the free space caches). After I manually deleted all volumes created by docker, btrfs does not hang anymore and lxd became fully operational. I moved the data out and reformated the partition, just in case.
I am thinking how to prevent this from happening again. Is it possible to disable the btrfs storage driver from being used by docker inside the containers? Containers can be used by other users and I do not want to disable docker (security.nesting=true
) completely, and I still like a possibility of doing automated (daily) snapshots.
Sadly, no. btrfs always allows for unprivileged creation of subvolumes so it's not something which can be turned off. The unprivileged deletion of subvolumes is a mount flag, but disabling that will just allow folks to create subvolumes and then be unable to remove them, making things even worse.
@stgraber can I fix this?
Absolutely!
Required information
I am sorry, I cannot run "lxc info" because lxd is currently disabled (because of the problem being reported). However, I can provide lxc info for another machine on the cluster, which should have the same configuration.
Issue description
lxc creates new task "Cleaning up expired instance snapshots" every minute idefinitely until a limit of 10000 threads is reached, after which it is killed by the kernel.
Looks like the tasks for deleting snapshots get stuck in the kernel, so no task gets completed.
Steps to reproduce
I think the problem is triggered by a combination of the following factors:
security.nesting=true
snapshot.schedule: 0 3 * * *
)Whenever docker creates containers, new btrfs subvolume are being created, which are then duplicated by lxd during daily snapshots (even if all docker containers are stopped). This way the number of btrfs subvolumes grows very quickly. On my machine, I ended up with about 46000 subvolumes before the problem took place.
Information to attach
dmesg
)dmesg
[ ] Container log (
lxc info NAME --show-log
)[ ] Container configuration (
lxc config show NAME --expanded
)[ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
[ ] Output of the daemon with --debug (alternatively output of
lxc monitor
while reproducing the issue)ps auxH | grep lxd
lxc operation list
(truncated)
top
:lxc storage info local --target server1
Further notes
It is well known that btrfs performs poorly with quota and subvolumes. Since this issue has been discovered, I have disabled quota, but btrfs still hangs in the kernell (even after multiple reboots). Currently
btrfs check
is running, which takes about a day. Next I will try to delete the docker subvolumes manually from the snapshots.The issue was originally reported on the lxd discussion forum