Closed Kramerican closed 5 years ago
It's most likely some other process that's forked the mount table and is now preventing LXD from unmounting the container...
You can run grep containers/test1 /proc/*/mountinfo
to find out what process that is.
You can then run nsenter -t <PID> -m -- umount /var/lib/lxd/storage-pools/lxd/containers/test1
to get rid of that mount, at which point lxc delete
should work again...
You mean
cat /proc/*/mountinfo | grep containers/test1
?
No hits ...
lxd delete still reports dataset is busy :(
Edit: grepping for the other, running, container results in lots of hits, so I think I have formatted that grep correctly. It seems there is nothing referencing test1 in proc/*/mountinfo. Any further ideas? :)
Hmm, then it's not mounted anywhere visible which would likely make it a kernel bug... You can wait for a while hoping for the kernel to untangle whatever's going on or you can reboot the system which will fix it for sure...
Sorry I don't have a better answer for this.
@stgraber Oh dear Cthulhu, that's bad. That also explains why I'm seeing it across several systems, as I keep my servers in synch w/regards to kernel/os/package versions.
I just checked on one of my other systems, and there I have a dataset which I still cannot destroy even after 48+ hours. So it does not seem this will go away on its own. There it is also "invisible".
If you want access to the server Stéphane and poke around a bit, let me know. Otherwise I guess I'll just have to mitigate this manually (sigh) and update my kernels when I get the chance, and hope that resolves the issue.
PS: I am not used to seeing grep issued that way, your command was of course correctly formatted, I just assumed it didn't since I didn't get any hits #n00b
Should I report this as a bug somewhere, you think?
If others stumble on this issue: There is a workaround in that it is possible to rename the dataset. So if your container is stopped, you can do:
zfs rename lxd/containers/test1 lxd/containers/test1failed
After which you can issue
lxc delete test1
However you then still have this dataset hanging around, which you will need to clean up at a later date, i.e. after a reboot I suppose. This pretty much sucks! :D
Yeah, that's pretty odd, I wonder what's keeping that active... You don't have any running zfs
command for that dataset (zfs send
, zfs snapshot
, ...)?
Just run ps aux | grep test1
to be sure.
If not, then I'd be happy to take a look, see if anything stands out. There's a pretty good chance that it's a kernel bug, but we haven't seen reports of this before so it's intriguing.
(Note that I'm on vacation in Europe this week so not quite as around as usual :))
Nope no zfs commands running. I have sent you access by mail :) - enjoy your vacation..!!
Very weird. I poked around for a bit, eventually restarting the lxd
process which was apparently enough to get zfs to unstick and I could then delete the dataset just fine.
Now that we know that kicking lxd apparently unsticks zfs, can you let me know if you have another machine with the same issue (or can cause the one I already have access to to run into it again)?
I'd like to see what LXD shows as open prior to being killed, then if just killing it is enough to make zfs happy and if not, then why would lxd starting again somehow unstick zfs.
FWIW, what I tried before restarting lxd was:
None of which showed anything relevant...
It could be an fd leak from a file that was read or written from the container by LXD and wasn't closed, but what's odd is that if it was the case, we should have seen an fd with the container path and there were none of them... Hopefully I can look at another instance of this problem and figure that part out.
Marking incomplete for now, @Kramerican let me know when you have another affected system.
I sure will @stgraber I'm also on vacation and haven't had a chance to check if I can provoke this behaviour. I'll let you know.
@Kramerican still on vacation? :)
@stgraber Yes until next week - but I will set some time aside then to try and force this behavior.
@stgraber I have had little luck in forcing this behavior, but this has been cropping up all over the shop these last few days.
I had written some fallback code in our tools which simply renames the dataset, so that lxc delete could be run. These datasets are still "stuck" and zfs refuses to delete them. I have not restarted lxd in order to delete them - is it enough for you to get access to one of these systems to diagnose further? In which case let me know and I'll give you access. Thanks..!
@Kramerican yep, having access to one of the systems with such a stuck dataset should be enough to try to track down what LXD's got open that would explain the busy error.
@stgraber Excellent. Mail with details sent.
@Kramerican so the only potential issue I'm seeing is a very large number of mapped /config
files which is a leak that I believe has already been fixed with a combination of lxd and liblxc fix. Any chance you can upgrade your systems to 3.0.1 of both liblxc1 and lxd? Both have been available for about a week now.
If it's an option at all, at least on your Ubuntu 18.04 systems, I'd recommend considering moving to the lxd snap (--channel=3.0/stable in your case) as that would get you a much faster turnaround for fixes than we can do with the deb package.
@stgraber Excellent. However on the system where you have access, I have apt upgrade hanging at 95% at Setting up lxd (3.0.1-0ubuntu1~18.04.1) ..
/var/log/lxd/lxd.log shows and entry which I think is responsible:
lvl=eror msg="Failed to cleanly shutdown daemon: raft did not shutdown within 10s" t=2018-07-04T21:31:27+0200
Is raft a process I can kill? Suggestions on how to unstick the upgrade?
@stgraber Nevermind - it got unstuck after a while. Everything seems fine.
I will upgrade all systems and report back if the issue persists. Thanks..!
@Kramerican pretty sure it got unstuck because I logged in and ran systemctl stop lxd lxd.socket
to unblock things. Looks like the RAFT database is hitting a timeout at startup.
It's actually a bug that 3.0.1 fixes but if your database has too many transactions prior to the upgrade, it still fails to start. The trick to unstick it is to temporarily move it to a very fast tmpfs which I'm doing on that system now.
@stgraber ah yes I saw that lxc ls and other commands were not working. I won't mess around on that system anymore until you report back.
Series of commands which will help unstick lxd would be nice to have here, in case I see this happen on one of the other ~10 hosts I need to upgrade
@Kramerican all done, that system is good to go.
If you hit the same problem, you'll need to:
This will unstick the update. Once the update is all done, run again (for good measure):
That should ensure that LXD is fully stopped (containers are still running fine though). Once that's done, do:
You'll see the daemon start, let it run until it hits "Done updating instance types" which is when it'll be ready for normal operation, then hit ctrl+c to stop it. Once done, do:
And you'll be back online with the newly compacted and much faster database.
This is only needed on systems where LXD isn't able to load its database within the 10s timeout so hopefully a majority of your systems will not need this trick. Once LXD successfully starts once on 3.0.1, the database gets compacted automatically in the background as well as on exit to prevent this problem from ever occurring again.
@stgraber This is pure epic. Thanks so much, I'll get started cracks knuckles :)
@Kramerican I've also deleted the two failed datasets on sisko, so restarting LXD did the trick to unstick zfs, now the question is whether we'll be seeing the issue re-appear with 3.0.1.
@stgraber ok so I completed the upgrade on all my hosts, only had to follow your steps here on one other system :+1:
In the process however it turns out that one of my systems was already all up to date with 3.0.1 and here the failure with a stuck dataset happened today
I have just sent a mail with details and access to the system
@stgraber Did you see my last message yesterday, where I'd found that this had actually already happened on a 3.0.1 system? You should have access details in your inbox :)
@Kramerican Hi, I saw the comment and e-mail but haven't yet had time to look at that system.
@stgraber Can I assume that you are on track with this bug and it is in the process of being fixed? Let me know if you need the user account I set up for you, otherwise I'd like to nuke it from that system. Due diligence and all :)
@Kramerican Hi, unfortunately no, I haven't had time to look into this yet as I've been having management meetings this week with limited time to look into LXD bugs.
@stgraber That is all good big buddy - I will leave that user account for now, so you can dig into that system whenever it suits you best.
@stgraber This is an on-going issue across most hosts in our infrastructure. I now have quite a few hanging datasets lying around, as this keeps cropping up. All systems are running LXD 3.0.1 on Bionic at the moment.
Hope you can find time to give this another look one of them days :) Please let me know if you need me to re-send access credentials
@stgraber Hi, I'm sorry for this , I have meet a issue, Hope you can help me. when I type lxc list , I got this: +----------------+---------+------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +----------------+---------+------+------+------------+-----------+ | flowing-beagle | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | lxdmcj | ERROR | | | PERSISTENT | |
And then ,I try to delete the lxdmcj ,It sent me a message: error: cannot open 'mcj-lxd-zfs': dataset does not exist
@ma3252788 that's unrelated to the issue we're investigating here. Your error suggests that your entire zpool is somehow offline. Check with zpool status
to see if it's offline due to data corruption or if it's entirely missing from ZFS (not imported).
I have run into the same problem on two different systems, recently: Failed to destroy ZFS filesystem: cannot destroy 'z/lxd/containers/otp1': dataset is busy I'm on snap lxd version 3.6 Rev 9298.
snap list: lxd 3.6 9298 stable canonical✓ -
/etc/lsb-release: DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
uname -a: Linux star 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I did the mountinfo search and did a ps -f on the resulting pids, which can be done in one line like this:
ps -f -p `grep containers/otp1 /proc/*/mountinfo | sed 's-^/proc/--' | sed 's-/.*--' | tr '\n' ',' | sed 's/,$//'`
The result was 26 processes of the kind: root 560 1 0 Oct15 ? 00:00:00 [lxc monitor] /var/snap/lxd/common/lxd/containers fil (where fil is a running container) plus this process: root 2356 1 0 Oct15 ? 00:00:05 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
The system has been running for 10 days, so Oct15 must be around boot time.
For the lxc monitor processes, I compared the 26 containers that I saw in ps with the list of the 31 running containers. The 5 missing containers were containers that I created yesterday or today. Two were new, 1 was renamed (lxc move), 1 was copied (lxc copy), and the 5th was the container that was copied from.
In both cases, the container that cannot be deleted is a container that I've been cloning frequently from another container, cycling through this pattern:
lxc copy --container-only otp otp1
lxc profile apply otp1 profile1,profile2
lxc start otp1
... use otp1 for a while
lxc stop otp1
lxc delete otp1
@melato the issue from @Kramerican is a bit different because in his case he has no actual references being held for the mount.
Your case is somewhat more common, effectively the container was already mounted by the time lxcfs started (possibly because lxcfs crashed and restarted?) which meant lxcfs would get a copy of the mount into its mount namespace, holding a reference to it.
To unblock things, what you can do is use nsenter -t PID -m
against the processes that you found has mount references, then unmount the path through that shell, once all references are gone, delete should work fine.
@brauner for the lxcfs
part of this, any chance we can make lxcfs cleanup its mount namespace so it doesn't hold those refs?
The lxc monitor
side is a bit harder because it's not itself maintaining a mount namespace, instead it's using whatever mount namespace LXD was running with at the time, effectively holding onto an old namespace as LXD get refreshed, I've not yet found a good way to avoid this issue, some of the options are:
[lxc monitor]
use an intermediate mount namespace that it cleans up after initialization to only reference what it absolutely needsI was able to delete the container after doing the nsenter -p suggestion for the lxcfs process.
I can confirm the bug that @melato has discovered recently. In my case it's a cloned container too. The nsenter workaround fixed it.
I've hit this "Failed to destroy ZFS filesystem" bug on LXD 3.2 on Alpine 3.8.1 as well. I'll try @stgraber's workarounds & report back
It appears that my container may have gotten into this state because the underlying image expired & was removed from the image cache. ZFS was in a weird state where it said the dataset was busy, but it certainly wasn't mounted under anything accessible on the host.
I'll make a local copy of the image (debian stretch) before launch so hopefully the image won't expire (I need to check into that).
Luckily this system is not a critical production box, so a simple fix to delete the container was:
I'll chime in if it happens again...
BTW this is Linux 4.14 ZFS v0.7.8-1 lxd 3.2 lxcfs 3.0.1 lxc 2.1.1
(so legacy cgroups mode)
For the record, after much wrangling, I've gotten LXC 3.0.2 to compile on Alpine. Haven't had a chance to test it yet.
If others stumble on this issue: There is a workaround in that it is possible to rename the dataset. So if your container is stopped, you can do:
zfs rename lxd/containers/test1 lxd/containers/test1failed
After which you can issue
lxc delete test1
However you then still have this dataset hanging around, which you will need to clean up at a later date, i.e. after a reboot I suppose. This pretty much sucks! :D
In my case your workaround works perfectly and I am able to destroy dataset without a reboot after renaming it and deleting lxd container.
Just ran into this issue running LXD 3.10 (snap), which restarted on February 12, one day after the package has been published on snapcraft.io. From lxd.log
:
LXD 3.10 is starting in normal mode" path=/var/snap/lxd/common/lxd
[...]
Initializing global database"
Updating the LXD global schema. Backup made as \"global.bak\"
The problem is indeed bound to the 10s timeout, as it takes exactly 10s to lxc to print the "dataset is busy" error. Comment https://github.com/lxc/lxd/issues/4656#issuecomment-402550930 made me think that lxd >= 3.0.1 was able to compact the database in the background to avoid this issue, but apparently this is not the case, or the cause of the problem is different.
The machine where I'm experiencing the issue has 27 containers, but it is not heavily loaded. Containers could be removed reliaibly up to a few days ago. Is there any other useful information I can let you have? Thanks!
Cc: @blackboxsw
@paride your issue is likely a different one and just has to do with snapd generating new mount namespaces which hold up mount table entries for ZFS.
For one such un-deletable stopped container, please do:
This will hopefully show you some PIDs, those are the PIDs that are in a mount namespace where this container is still mounted.
You can then unblock this issue by doing:
Doing this for any of the returned PID will usually make all the mount go away for all the other PIDs as they're likely to share the same mount namespace.
We landed code in the LXD snap a while back to avoid such issues by doing some very complicated dance in the various mount tables, this usually worked well, except that the most recent snapd release had an upgrade bug which made it so it would hide the old mount table, making our normal mitigation useless for some users... As far as I can tell, this bug got resolved in snapd so this shouldn't happen again with the next snapd release (crosses fingers)...
Thanks @stgraber, I did try grepping /proc/*/mountinfo
but nothing there matches the name of the container I'm trying to delete:
$ lxc delete paride-ubuntu-core-16
Error: Failed to destroy ZFS filesystem: cannot destroy 'zfs-lxd/containers/paride-ubuntu-core-16': dataset is busy
$ grep container/paride-ubuntu-core-16 /proc/*/mountinfo
$
@paride oh, oops, try containers/paride-ubuntu-core-18
@stgraber thanks, it worked.
Minty fresh Ubuntu 18.04 system LXD v3.0.0 (latest from apt, how to get v3.0.1?)
Started seeing this beginning last week crop up arbitrarily across my infrastructure. Out of ~10 delete operations, I have seen this happen to 3 containers on 2 different systems.
Tried googling around a bit and I have tried the most common tips on figuring out what might be keeping the dataset busy: There are no snapshots or dependencies, dataset is unmounted i.e.
zfs list
reportsCould LXD still be holding the dataset? I see there are a number of zfs related fixes in v3.0.1 but I cannot do an apt upgrade to this version..?
Edit: issuing
systemctl restart lxd
does not resolve the issue, so maybe not lxd after all. Strange...