Closed florian-bellencontre closed 4 years ago
Can you show the output of dmesg
?
This very much sounds like a kernel issue.
dmesg log:
[Thu Nov 12 14:38:47 2020] audit: type=1400 audit(1605188347.873:4559): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-elearning_</var/snap/lxd/common/lxd>" name="/bin/" pid=177441 comm="(ionclean)" flags="ro, remount, noatime, bind"
[Thu Nov 12 15:08:47 2020] audit: type=1400 audit(1605190148.042:4560): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-elearning_</var/snap/lxd/common/lxd>" name="/bin/" pid=183931 comm="(ionclean)" flags="ro, remount, noatime, bind"
[Thu Nov 12 15:38:47 2020] audit: type=1400 audit(1605191948.244:4561): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-elearning_</var/snap/lxd/common/lxd>" name="/bin/" pid=192543 comm="(ionclean)" flags="ro, remount, noatime, bind"
[Thu Nov 12 16:08:48 2020] audit: type=1400 audit(1605193748.425:4562): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-elearning_</var/snap/lxd/common/lxd>" name="/bin/" pid=58105 comm="(ionclean)" flags="ro, remount, noatime, bind"
[Thu Nov 12 16:38:48 2020] audit: type=1400 audit(1605195548.610:4563): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-elearning_</var/snap/lxd/common/lxd>" name="/bin/" pid=71249 comm="(ionclean)" flags="ro, remount, noatime, bind"
[Thu Nov 12 16:51:47 2020] veth800cf365: renamed from eth120
[Thu Nov 12 16:51:47 2020] br_vlan120: port 2(veth2724a662) entered disabled state
[Thu Nov 12 16:51:47 2020] veth5980fba7: renamed from eth136
[Thu Nov 12 16:51:47 2020] br_vlan136: port 2(vethf9de70c8) entered disabled state
[Thu Nov 12 16:51:48 2020] device vethf9de70c8 left promiscuous mode
[Thu Nov 12 16:51:48 2020] br_vlan136: port 2(vethf9de70c8) entered disabled state
[Thu Nov 12 16:51:48 2020] device veth2724a662 left promiscuous mode
[Thu Nov 12 16:51:48 2020] br_vlan120: port 2(veth2724a662) entered disabled state
Ok, that's all fine, can you show ps fauxww
?
Unfortunately, it's very very long to look at the ps output and anonymize everything, there are critical production data so I can't give you the output since I may miss something. If you can narrow down what you need, I'll can give you a more succinct extract.
I'm looking for any "zfs" process running in there.
The only zfs process I can find is this one:
root 148954 0.0 0.0 4636 1844 ? Ss Nov03 0:00 /bin/sh /snap/lxd/18150/commands/daemon.start
root 149530 2.4 0.1 6231828 197544 ? Sl Nov03 311:32 \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd 150514 0.0 0.0 49964 3628 ? Ss Nov03 0:05 \_ dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.154.77.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.154.77.2,10.154.77.254,1h --listen-address=fd42:3af8:9c1a:2bf8::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
root 32798 0.0 0.0 37764 4520 ? S 16:52 0:00 \_ zfs mount nvme1/lxd/containers/elearning
root 32799 0.0 0.0 22412 1388 ? D 16:52 0:00 | \_ /bin/mount --no-canonicalize -t zfs -o defaults,noatime,dev,exec,rw,suid,nomand,zfsutil nvme1/lxd/containers/elearning /var/snap/lxd/common/lxd/storage-pools/default/containers/elearning
root 3866 0.0 0.0 37764 4596 ? S 17:18 0:00 \_ zfs mount nvme1/lxd/containers/elearning
root 3867 0.0 0.0 22412 1328 ? D 17:18 0:00 | \_ /bin/mount --no-canonicalize -t zfs -o defaults,noatime,dev,exec,rw,suid,nomand,zfsutil nvme1/lxd/containers/elearning /var/snap/lxd/common/lxd/storage-pools/default/containers/elearning
root 36437 0.0 0.0 37764 4452 ? S 17:23 0:00 \_ zfs mount nvme1/lxd/containers/elearning
root 36438 0.0 0.0 22412 1308 ? D 17:23 0:00 | \_ /bin/mount --no-canonicalize -t zfs -o defaults,noatime,dev,exec,rw,suid,nomand,zfsutil nvme1/lxd/containers/elearning /var/snap/lxd/common/lxd/storage-pools/default/containers/elearning
root 2345 0.0 0.0 37764 4596 ? S 18:17 0:00 \_ zfs mount nvme1/lxd/containers/elearning
root 2346 0.0 0.0 22412 1336 ? D 18:17 0:00 \_ /bin/mount --no-canonicalize -t zfs -o defaults,noatime,dev,exec,rw,suid,nomand,zfsutil nvme1/lxd/containers/elearning /var/snap/lxd/common/lxd/storage-pools/default/containers/elearning
Thanks, this is a kernel bug then.
In the above output, you can see LXD running zfs mount ...
, this calls /bin/mount
which is then getting stuck in I/O wait (D state). This denotes being stuck on a kernel operation usually.
You can look at /proc/2346/stack for example to get a better idea of exactly what it's stuck on.
Unfortunately there's nothing LXD can do in such cases. Those processes are unkillable and may only ever return if the kernel recovers, if it doesn't, then only option is a reboot.
Required information
Issue description
When I made my snapshot and tried to restore it, it was stuck for more than 5 minutes. I stopped the process by performing a CTRL + C.
I then tried to start my container but it was also stuck for several minutes.
Steps to reproduce
lxc snapshot
commandlxc restore
command.lxc start
which also got stuck.Information to attach
dmesg
) nothing related to lxdlxc info NAME --show-log
)Log:
[ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
root@srv-235:~ # grep default/containers/elearning /proc/*/mountinfo root@srv-235:~ #