Closed itoffshore closed 2 years ago
Ok, so to confirm, the steps are:
Correct?
I created 2 bind mounts of zfs
datasets in a Centos 6 container - then rebooted it from a root console inside the container (as I was also changing uidmaps
)
afterwards I created a 3rd bind mount of a zfs
dataset but this time rebooted via lxc restart
to save time logging into the container.
stgraber@castiana:~$ lxc launch images:centos/6 c6
Creating c6
Starting c6
stgraber@castiana:~$ lxc config device add c6 mnt disk source=/mnt path=/mnt/mnt
Device mnt added to c6
stgraber@castiana:~$ lxc config device add c6 srv disk source=/srv path=/mnt/srv
Device srv added to c6
stgraber@castiana:~$ lxc exec c6 bash
[root@c6 ~]# reboot
stgraber@castiana:~$ lxc config device add c6 opt disk source=/opt path=/mnt/opt
Device opt added to c6
stgraber@castiana:~$ lxc restart c6
stgraber@castiana:~$ lxc exec c6 -- grep /mnt /proc/mounts
castiana/ROOT/ubuntu /mnt/mnt zfs rw,relatime,xattr,posixacl 0 0
castiana/ROOT/ubuntu /mnt/opt zfs rw,relatime,xattr,posixacl 0 0
castiana/ROOT/ubuntu /mnt/srv zfs rw,relatime,xattr,posixacl 0 0
stgraber@castiana:~$
Can you show zfs get all zpool/lxd/containers/centos6-builder
A reboot cleared the error & the container is working today:
NAME PROPERTY VALUE SOURCE
zpool012/lxd/containers/centos6-builder type filesystem -
zpool012/lxd/containers/centos6-builder creation Fri Mar 27 11:54 2020 -
zpool012/lxd/containers/centos6-builder used 739M -
zpool012/lxd/containers/centos6-builder available 41.7T -
zpool012/lxd/containers/centos6-builder referenced 952M -
zpool012/lxd/containers/centos6-builder compressratio 2.34x -
zpool012/lxd/containers/centos6-builder mounted no -
zpool012/lxd/containers/centos6-builder origin zpool012/lxd/deleted/images/cdc3a4ac4cfba8998ba6cdb0e29c14bf5b4c76bc7a1f8427e3d8cf3f696f498e@readonly -
zpool012/lxd/containers/centos6-builder quota none local
zpool012/lxd/containers/centos6-builder reservation none default
zpool012/lxd/containers/centos6-builder recordsize 128K default
zpool012/lxd/containers/centos6-builder mountpoint /var/snap/lxd/common/lxd/storage-pools/pool1/containers/centos6-builder local
zpool012/lxd/containers/centos6-builder sharenfs off default
zpool012/lxd/containers/centos6-builder checksum on default
zpool012/lxd/containers/centos6-builder compression lz4 inherited from zpool012
zpool012/lxd/containers/centos6-builder atime on default
zpool012/lxd/containers/centos6-builder devices on inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder exec on inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder setuid on inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder readonly off default
zpool012/lxd/containers/centos6-builder zoned off default
zpool012/lxd/containers/centos6-builder snapdir hidden default
zpool012/lxd/containers/centos6-builder aclinherit passthrough inherited from zpool012
zpool012/lxd/containers/centos6-builder createtxg 684944 -
zpool012/lxd/containers/centos6-builder canmount noauto local
zpool012/lxd/containers/centos6-builder xattr sa inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder copies 1 default
zpool012/lxd/containers/centos6-builder version 5 -
zpool012/lxd/containers/centos6-builder utf8only on -
zpool012/lxd/containers/centos6-builder normalization formD -
zpool012/lxd/containers/centos6-builder casesensitivity sensitive -
zpool012/lxd/containers/centos6-builder vscan off default
zpool012/lxd/containers/centos6-builder nbmand off default
zpool012/lxd/containers/centos6-builder sharesmb off default
zpool012/lxd/containers/centos6-builder refquota none default
zpool012/lxd/containers/centos6-builder refreservation none default
zpool012/lxd/containers/centos6-builder guid 17762718722670246463 -
zpool012/lxd/containers/centos6-builder primarycache all default
zpool012/lxd/containers/centos6-builder secondarycache all default
zpool012/lxd/containers/centos6-builder usedbysnapshots 0B -
zpool012/lxd/containers/centos6-builder usedbydataset 739M -
zpool012/lxd/containers/centos6-builder usedbychildren 0B -
zpool012/lxd/containers/centos6-builder usedbyrefreservation 0B -
zpool012/lxd/containers/centos6-builder logbias latency default
zpool012/lxd/containers/centos6-builder dedup off default
zpool012/lxd/containers/centos6-builder mlslabel none default
zpool012/lxd/containers/centos6-builder sync standard default
zpool012/lxd/containers/centos6-builder dnodesize legacy default
zpool012/lxd/containers/centos6-builder refcompressratio 2.22x -
zpool012/lxd/containers/centos6-builder written 739M -
zpool012/lxd/containers/centos6-builder logicalused 1.57G -
zpool012/lxd/containers/centos6-builder logicalreferenced 1.92G -
zpool012/lxd/containers/centos6-builder volmode default default
zpool012/lxd/containers/centos6-builder filesystem_limit none default
zpool012/lxd/containers/centos6-builder snapshot_limit none default
zpool012/lxd/containers/centos6-builder filesystem_count none default
zpool012/lxd/containers/centos6-builder snapshot_count none default
zpool012/lxd/containers/centos6-builder snapdev hidden default
zpool012/lxd/containers/centos6-builder acltype posixacl inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder context none default
zpool012/lxd/containers/centos6-builder fscontext none default
zpool012/lxd/containers/centos6-builder defcontext none default
zpool012/lxd/containers/centos6-builder rootcontext none default
zpool012/lxd/containers/centos6-builder relatime on inherited from zpool012
zpool012/lxd/containers/centos6-builder redundant_metadata all default
zpool012/lxd/containers/centos6-builder overlay off default
ZFS config looks correct (mountpoint & canmount are the usual suspects for that type of issue).
the container was running mesos - Centos 6 containers do not have acpid
so perhaps they cannot be restarted cleanly from outside the container.
For containers we don't use acpi we signal the init system directly and we have clean shutdown test for all the images we publish
@itoffshore is this still happening?
@stgraber - the problem fixed itself after a reboot the same as for the original LXD forum user
Weird. Hopefully the recent mount table tweaks in the snap will help prevent this from happening again.
If someone else hits this, please let us know.
I have the identical problem. I can probably leave this in a broken state for a few hours, but ideally I'd like to reboot the host to clear this (this is my authoritative nameserver for home network)
root@nuc2:~# lxc start ns-auth
Error: Failed preparing container for start: Failed to run: zfs mount zfs/lxd/containers/ns-auth: cannot mount 'zfs/lxd/containers/ns-auth': filesystem already mounted
Try `lxc info --show-log ns-auth` for more info
root@nuc2:~# lxc info --show-log ns-auth
Name: ns-auth
Location: none
Remote: unix://
Architecture: x86_64
Created: 2018/06/15 21:10 UTC
Status: Stopped
Type: container
Profiles: br255
Log:
root@nuc2:~#
The host (nuc2) is running Ubuntu 18.04 and lxd 4.11 from snap
root@nuc2:~# snap list
Name Version Rev Tracking Publisher Notes
core 16-2.49 10859 latest/stable canonical✓ core
core18 20210128 1988 latest/stable canonical✓ base
lxd 4.11 19566 latest/stable canonical✓ -
The container (ns-auth) was Ubuntu 16.04, and I'd just done a "do-release-upgrade" to update to 18.04. It had just rebooted from inside the container, but didn't restart, and now I can't restart it from the host either.
zfs properties:
root@nuc2:~# zfs get all zfs/lxd/containers/ns-auth
NAME PROPERTY VALUE SOURCE
zfs/lxd/containers/ns-auth type filesystem -
zfs/lxd/containers/ns-auth creation Fri Jun 15 22:10 2018 -
zfs/lxd/containers/ns-auth used 1.71G -
zfs/lxd/containers/ns-auth available 99.5G -
zfs/lxd/containers/ns-auth referenced 809M -
zfs/lxd/containers/ns-auth compressratio 1.77x -
zfs/lxd/containers/ns-auth mounted no -
zfs/lxd/containers/ns-auth quota none default
zfs/lxd/containers/ns-auth reservation none default
zfs/lxd/containers/ns-auth recordsize 128K default
zfs/lxd/containers/ns-auth mountpoint /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth local
zfs/lxd/containers/ns-auth sharenfs off default
zfs/lxd/containers/ns-auth checksum on default
zfs/lxd/containers/ns-auth compression lz4 inherited from zfs
zfs/lxd/containers/ns-auth atime on default
zfs/lxd/containers/ns-auth devices on default
zfs/lxd/containers/ns-auth exec on default
zfs/lxd/containers/ns-auth setuid on default
zfs/lxd/containers/ns-auth readonly off default
zfs/lxd/containers/ns-auth zoned off default
zfs/lxd/containers/ns-auth snapdir hidden default
zfs/lxd/containers/ns-auth aclinherit restricted default
zfs/lxd/containers/ns-auth createtxg 852601 -
zfs/lxd/containers/ns-auth canmount noauto local
zfs/lxd/containers/ns-auth xattr on default
zfs/lxd/containers/ns-auth copies 1 default
zfs/lxd/containers/ns-auth version 5 -
zfs/lxd/containers/ns-auth utf8only off -
zfs/lxd/containers/ns-auth normalization none -
zfs/lxd/containers/ns-auth casesensitivity sensitive -
zfs/lxd/containers/ns-auth vscan off default
zfs/lxd/containers/ns-auth nbmand off default
zfs/lxd/containers/ns-auth sharesmb off default
zfs/lxd/containers/ns-auth refquota none default
zfs/lxd/containers/ns-auth refreservation none default
zfs/lxd/containers/ns-auth guid 18270794170817550884 -
zfs/lxd/containers/ns-auth primarycache all default
zfs/lxd/containers/ns-auth secondarycache all default
zfs/lxd/containers/ns-auth usedbysnapshots 939M -
zfs/lxd/containers/ns-auth usedbydataset 809M -
zfs/lxd/containers/ns-auth usedbychildren 0B -
zfs/lxd/containers/ns-auth usedbyrefreservation 0B -
zfs/lxd/containers/ns-auth logbias latency default
zfs/lxd/containers/ns-auth dedup off default
zfs/lxd/containers/ns-auth mlslabel none default
zfs/lxd/containers/ns-auth sync standard default
zfs/lxd/containers/ns-auth dnodesize legacy default
zfs/lxd/containers/ns-auth refcompressratio 1.89x -
zfs/lxd/containers/ns-auth written 733M -
zfs/lxd/containers/ns-auth logicalused 2.79G -
zfs/lxd/containers/ns-auth logicalreferenced 1.38G -
zfs/lxd/containers/ns-auth volmode default default
zfs/lxd/containers/ns-auth filesystem_limit none default
zfs/lxd/containers/ns-auth snapshot_limit none default
zfs/lxd/containers/ns-auth filesystem_count none default
zfs/lxd/containers/ns-auth snapshot_count none default
zfs/lxd/containers/ns-auth snapdev hidden default
zfs/lxd/containers/ns-auth acltype off default
zfs/lxd/containers/ns-auth context none default
zfs/lxd/containers/ns-auth fscontext none default
zfs/lxd/containers/ns-auth defcontext none default
zfs/lxd/containers/ns-auth rootcontext none default
zfs/lxd/containers/ns-auth relatime off default
zfs/lxd/containers/ns-auth redundant_metadata all default
zfs/lxd/containers/ns-auth overlay off default
(note mounted no
). I can't find any more useful lxd logs:
root@nuc2:~# ls -l /var/snap/lxd/common/lxd/logs/ns-auth/
total 6
-rw-r--r-- 1 root root 0 Mar 6 20:05 forkexec.log
-rw-r----- 1 root root 2178 Dec 27 08:23 lxc.conf
-rw-r----- 1 root root 0 Mar 7 10:48 lxc.log
-rw-r----- 1 root root 0 Mar 7 10:48 lxc.log.old
root@nuc2:~# tail -6 /var/snap/lxd/common/lxd/logs/lxd.log
t=2021-03-07T09:08:01+0000 lvl=info msg="Pruning expired instance backups"
t=2021-03-07T09:08:01+0000 lvl=info msg="Done pruning expired instance backups"
t=2021-03-07T09:59:13+0000 lvl=warn msg="Detected poll(POLLNVAL) event."
t=2021-03-07T10:08:01+0000 lvl=info msg="Pruning expired instance backups"
t=2021-03-07T10:08:01+0000 lvl=info msg="Done pruning expired instance backups"
t=2021-03-07T10:31:26+0000 lvl=warn msg="Detected poll(POLLNVAL) event."
root@nuc2:~#
strace of the lxd process (13251) and descendants with strace -s1024 -f -p 13251 2>ert
shows:
...
[pid 16917] execve("/snap/lxd/current/zfs-0.7/bin/zfs", ["zfs", "mount", "zfs/lxd/containers/ns-auth"], 0xc000b22a00 /* 39 vars */ <unfinished ...>
...
[pid 16917] openat(AT_FDCWD, "/proc/self/mounts", O_RDONLY) = 4
[pid 16917] openat(AT_FDCWD, "/etc/dfs/sharetab", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 16917] openat(AT_FDCWD, "/dev/zfs", O_RDWR) = 5
...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...\nzfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/", 1024) = 1024
[pid 16917] read(4, "default/containers/ns-auth zfs rw,xattr,noacl 0 0\n...
[pid 16917] read(4, "...", 1024) = 807
[pid 16917] read(4, "", 1024) = 0
[pid 16917] write(2, "cannot mount 'zfs/lxd/containers/ns-auth': filesystem already mounted\n", 70) = 70
And indeed, I can see that while ns-auth is not mounted on the host, it is in /proc/<pid>/mounts
of the lxd process
root@nuc2:~# wc -l /proc/mounts; grep ns-auth /proc/mounts
49 /proc/mounts
root@nuc2:~# wc -l /proc/13251/mounts; grep ns-auth /proc/13251/mounts
113 /proc/13251/mounts
zfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth zfs rw,xattr,noacl 0 0
root@nuc2:~#
I can see this with nsenter too:
root@nuc2:~# nsenter -t 13251 grep ns-auth /proc/mounts
root@nuc2:~# nsenter -t 13251 -m grep ns-auth /proc/mounts
zfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth zfs rw,xattr,noacl 0 0
root@nuc2:~#
However I'm out of my depth here. I can't do the unmount:
root@nuc2:~# nsenter -t 13251 -m umount /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth
umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth: no mount point specified.
root@nuc2:~# nsenter -t 13251 -m ls /var/snap/lxd/common/shmounts/storage-pools/
ls: cannot access '/var/snap/lxd/common/shmounts/storage-pools/': No such file or directory
root@nuc2:~# nsenter -t 13251 -m ls /var/snap/lxd/common/shmounts/
instances lxcfs
root@nuc2:~# nsenter -t 13251 -m zfs get all zfs/lxd/containers/ns-auth | grep mount
nsenter: failed to execute zfs: No such file or directory
root@nuc2:~# nsenter -t 13251 -m /snap/lxd/current/zfs-0.7/bin/zfs get all zfs/lxd/containers/ns-auth | grep mount
/snap/lxd/current/zfs-0.7/bin/zfs: error while loading shared libraries: libnvpair.so.1: cannot open shared object file: No such file or directory
Is there anything else you want me to check before rebooting?
BTW there's nothing unusual about the config of this container.
root@nuc2:~# lxc config show -e ns-auth
architecture: x86_64
config:
volatile.base_image: 8220e89e33e6f62b56cb451cfed61574074416a66a6e7c61ff574d95572e6661
volatile.eth0.hwaddr: 00:16:3e:27:fe:a9
volatile.idmap.base: "0"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
volatile.last_state.power: STOPPED
volatile.uuid: 43b39bcd-17a8-447f-83a4-dd6f0aeda98c
devices:
eth0:
name: eth0
nictype: bridged
parent: br255
type: nic
root:
path: /
pool: default
type: disk
ephemeral: false
profiles:
- br255
stateful: false
description: ""
root@nuc2:~#
Ha, got zfs binary to work (needed to copy LD_LIBRARY_PATH
from /proc/13251/environ
)
root@nuc2:~# nsenter -t 13251 -a
mesg: ttyname failed: No such device
root@nuc2:/# export LD_LIBRARY_PATH=/snap/lxd/current/zfs-0.7/lib/:/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/lxd/19566/lib:/snap/lxd/19566/lib/x86_64-linux-gnu:/snap/lxd/19566/lib/x86_64-linux-gnu/ceph:/snap/lxd/19566/zfs-0.6/lib:/snap/lxd/19566/zfs-2.0/lib:/snap/lxd/19566/lib:/snap/lxd/19566/lib/x86_64-linux-gnu:/snap/lxd/current/lib:/snap/lxd/current/lib/x86_64-linux-gnu:/snap/lxd/current/lib/x86_64-linux-gnu/ceph
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs get all zfs/lxd/containers/ns-auth | grep mount
zfs/lxd/containers/ns-auth mounted yes -
zfs/lxd/containers/ns-auth mountpoint /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth local
zfs/lxd/containers/ns-auth canmount noauto local
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs get all zfs/lxd/containers/ns-auth
NAME PROPERTY VALUE SOURCE
zfs/lxd/containers/ns-auth type filesystem -
zfs/lxd/containers/ns-auth creation Fri Jun 15 22:10 2018 -
zfs/lxd/containers/ns-auth used 1.71G -
zfs/lxd/containers/ns-auth available 99.4G -
zfs/lxd/containers/ns-auth referenced 809M -
zfs/lxd/containers/ns-auth compressratio 1.77x -
zfs/lxd/containers/ns-auth mounted yes -
zfs/lxd/containers/ns-auth quota none default
zfs/lxd/containers/ns-auth reservation none default
zfs/lxd/containers/ns-auth recordsize 128K default
zfs/lxd/containers/ns-auth mountpoint /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth local
zfs/lxd/containers/ns-auth sharenfs off default
zfs/lxd/containers/ns-auth checksum on default
zfs/lxd/containers/ns-auth compression lz4 inherited from zfs
zfs/lxd/containers/ns-auth atime on default
zfs/lxd/containers/ns-auth devices on default
zfs/lxd/containers/ns-auth exec on default
zfs/lxd/containers/ns-auth setuid on default
zfs/lxd/containers/ns-auth readonly off default
zfs/lxd/containers/ns-auth zoned off default
zfs/lxd/containers/ns-auth snapdir hidden default
zfs/lxd/containers/ns-auth aclinherit restricted default
zfs/lxd/containers/ns-auth createtxg 852601 -
zfs/lxd/containers/ns-auth canmount noauto local
zfs/lxd/containers/ns-auth xattr on default
zfs/lxd/containers/ns-auth copies 1 default
zfs/lxd/containers/ns-auth version 5 -
zfs/lxd/containers/ns-auth utf8only off -
zfs/lxd/containers/ns-auth normalization none -
zfs/lxd/containers/ns-auth casesensitivity sensitive -
zfs/lxd/containers/ns-auth vscan off default
zfs/lxd/containers/ns-auth nbmand off default
zfs/lxd/containers/ns-auth sharesmb off default
zfs/lxd/containers/ns-auth refquota none default
zfs/lxd/containers/ns-auth refreservation none default
zfs/lxd/containers/ns-auth guid 18270794170817550884 -
zfs/lxd/containers/ns-auth primarycache all default
zfs/lxd/containers/ns-auth secondarycache all default
zfs/lxd/containers/ns-auth usedbysnapshots 939M -
zfs/lxd/containers/ns-auth usedbydataset 809M -
zfs/lxd/containers/ns-auth usedbychildren 0B -
zfs/lxd/containers/ns-auth usedbyrefreservation 0B -
zfs/lxd/containers/ns-auth logbias latency default
zfs/lxd/containers/ns-auth dedup off default
zfs/lxd/containers/ns-auth mlslabel none default
zfs/lxd/containers/ns-auth sync standard default
zfs/lxd/containers/ns-auth dnodesize legacy default
zfs/lxd/containers/ns-auth refcompressratio 1.89x -
zfs/lxd/containers/ns-auth written 733M -
zfs/lxd/containers/ns-auth logicalused 2.79G -
zfs/lxd/containers/ns-auth logicalreferenced 1.38G -
zfs/lxd/containers/ns-auth volmode default default
zfs/lxd/containers/ns-auth filesystem_limit none default
zfs/lxd/containers/ns-auth snapshot_limit none default
zfs/lxd/containers/ns-auth filesystem_count none default
zfs/lxd/containers/ns-auth snapshot_count none default
zfs/lxd/containers/ns-auth snapdev hidden default
zfs/lxd/containers/ns-auth acltype off default
zfs/lxd/containers/ns-auth context none default
zfs/lxd/containers/ns-auth fscontext none default
zfs/lxd/containers/ns-auth defcontext none default
zfs/lxd/containers/ns-auth rootcontext none default
zfs/lxd/containers/ns-auth relatime off default
zfs/lxd/containers/ns-auth redundant_metadata all default
zfs/lxd/containers/ns-auth overlay off default
However I still can't unmount it:
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs unmount zfs/lxd/containers/ns-auth
umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth': umount failed
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs unmount /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth
cannot unmount '/var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth': not a mountpoint
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs get mountpoint zfs/lxd/containers/ns-auth
NAME PROPERTY VALUE SOURCE
zfs/lxd/containers/ns-auth mountpoint /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth local
/proc/mounts
says it's mounted somewhere else - but that doesn't work either.
root@nuc2:/# grep ns-auth /proc/mounts
zfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth zfs rw,xattr,noacl 0 0
root@nuc2:/# ls /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth
ls: cannot access '/var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth': No such file or directory
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs unmount /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth': No such file or directory
I've restarted the server now. Sorry.
I have this problem again. I am now running lxd 4.18 from snap under Ubuntu 18.04.5, kernel linux-image-generic-hwe-18.04 (5.4.0.81.91~18.04.73)
, zfsutils-linux 0.7.5-1ubuntu16.12
, and my default pool is zfs.
I have a number of containers, including one called "netbox" and another called "netbox3". I simply wanted to rename the container "netbox" (which was running fine) to "netbox2".
When I attempt this it fails:
root@nuc2:~# lxc rename netbox netbox2
Error: Renaming of running container not allowed
root@nuc2:~# lxc stop netbox
root@nuc2:~# lxc rename netbox netbox2
Error: Rename instance: Failed to run: zfs rename zfs/lxd/containers/netbox zfs/lxd/containers/netbox2: umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/netbox: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/default/containers/netbox': umount failed
And the container is unchanged:
root@nuc2:~# lxc list netbox
+---------+---------+---------------------+-------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------+---------+---------------------+-------------------------------+-----------+-----------+
| netbox | STOPPED | | | CONTAINER | 0 |
+---------+---------+---------------------+-------------------------------+-----------+-----------+
| netbox3 | RUNNING | 10.12.255.50 (eth0) | 2a01:5d00:1000:8ff::50 (eth0) | CONTAINER | 0 |
+---------+---------+---------------------+-------------------------------+-----------+-----------+
root@nuc2:~# lxc storage list
+---------+--------+--------------------------------+-------------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY |
+---------+--------+--------------------------------+-------------+---------+
| default | zfs | zfs/lxd | | 18 |
+---------+--------+--------------------------------+-------------+---------+
| plain | dir | /var/lib/snapd/hostfs/data/lxd | | 0 |
+---------+--------+--------------------------------+-------------+---------+
root@nuc2:~# zfs list -r zfs/lxd | grep netbox
zfs/lxd/containers/netbox 1.40G 92.1G 1.20G /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox
zfs/lxd/containers/netbox3 1.98G 92.1G 1.87G /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox3
But more importantly, I now can't even restart it, and the log is empty.
root@nuc2:~# lxc start netbox
Error: Failed preparing container for start: Failed to run: zfs mount zfs/lxd/containers/netbox: cannot mount 'zfs/lxd/containers/netbox': filesystem already mounted
Try `lxc info --show-log netbox` for more info
root@nuc2:~# lxc info --show-log netbox
Name: netbox
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2020/04/13 17:11 BST
Last Used: 2021/07/21 10:44 BST
Log:
root@nuc2:~# zfs get all zfs/lxd/containers/netbox | grep mount
zfs/lxd/containers/netbox mounted no -
zfs/lxd/containers/netbox mountpoint /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox local
zfs/lxd/containers/netbox canmount noauto local
root@nuc2:~#
However, if I create a fresh container for testing, it's happy:
root@nuc2:~# lxc init ubuntu:18.04 test123
Creating test123
oot@nuc2:~# zfs list -r zfs/lxd | grep test
zfs/lxd/containers/test123 196K 92.1G 453M /var/snap/lxd/common/lxd/storage-pools/default/containers/test123
root@nuc2:~# lxc rename test123 test456
root@nuc2:~# lxc start test456
root@nuc2:~# lxc stop test456
root@nuc2:~# lxc rename test456 test789
root@nuc2:~# lxc delete test789
root@nuc2:~#
root@nuc2:~# lxc init ubuntu:18.04 blah
Creating blah
root@nuc2:~# lxc init ubuntu:20.04 blah3
Creating blah3
root@nuc2:~# lxc rename blah blah2
root@nuc2:~# lxc delete blah2 blah3
root@nuc2:~#
Once again, I can see that the lxd process itself has this filesystem in its mounts:
root@nuc2:~# ps auxwww | grep lxd
...
root 11576 0.0 0.0 2616 352 ? Ss Sep06 0:00 /bin/sh /snap/lxd/21468/commands/daemon.start
root 11739 0.1 0.6 1924312 50564 ? Sl Sep06 5:44 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root@nuc2:~# grep netbox /proc/11739/mounts
zfs/lxd/containers/netbox /var/snap/lxd/common/shmounts/storage-pools/default/containers/netbox zfs rw,xattr,noacl 0 0
zfs/lxd/containers/netbox3 /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox3 zfs rw,xattr,noacl 0 0
root@nuc2:~#
(EDIT: interesting that one is /var/snap/lxd/common/shmounts/...
and one is /var/snap/lxd/common/lxd/...
. I observe that the directory /var/snap/lxd/common/shmounts
exists but is empty. I also see that /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox
and /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox3
both exist, but both appear to be empty)
I am happy to leave this container in a broken state for a day or two, if you can give me other commands to poke it with.
I've reopened this as there still seems to be some issues with ZFS when snap package is refreshed. Did you reboot the machine after day 2, and did that fix the issue again?
It may or may not be related, but soon afterwards this machine started to report some errors with the zpool on the SSD:
The number of checksum errors associated with a ZFS device
exceeded acceptable levels. ZFS has marked the device as
degraded.
impact: Fault tolerance of the pool may be compromised.
eid: 18658
class: statechange
state: DEGRADED
This is a single-device vdev (no redundancy).
Although it still appeared to be functioning, I took the precaution of replacing the SSD, and rebuilding the machine with Ubuntu 20.04 while I was at it, so I'm afraid that means the error state is now lost.
Ah OK that sounds suspect, I'll close again but if you see reoccur without any ZFS errors then let us know. Thanks
I know this issue is closed, but I'd like to add any information since I faced the same issue
In my case the error messages was:
>lxc start mycontainer
Error: Failed preparing container for start: Failed to run: zfs mount zpool1/lxd/containers/mycontainer: cannot mount 'zpool1/lxd/containers/mycontainer': filesystem already mounted
Try `lxc info --show-log mycontainer` for more info
after this I thought ok, if it is mounted, I'll unmount it:
>zfs umount zpool1/lxd/containers/mycontainer
cannot unmount 'zpool1/lxd/containers/mycontainer': not currently mounted
hmmm. thats really suspect - in a strange state of mind I thought: ok, then I'll mount it and help lxd...
>zfs mount zpool1/lxd/containers/mycontainer
and it worked - after that I could restart the container without rebooting the server...
I cannot explain it, but it solved my issue and in meanwhile I had this situation the second time and it worked the second time - so maybe this helps anybody who faces the same issues... although this cannot be the solution for future, it helped me out.
Lxd seems to think that this filesystem is already mounted, but zfs sais it isn't - and if you mount it manually with zfs directly, lxd accepts that and starts up...
We have switched to using the normal mount
command and a ZFS mountpoint=none
setting recently after we saw that using zfs mount
was sometimes causing the volume to be mounted in the host's mount namespace rather than the snap's namespace (even though we were running the command from the snap's mount namespace).
Using the legacy tooling and telling ZFS not to use its own mountpoint seemed to help when one of our devs was experiencing the same problem when using an existing ZFS pool with LXD as a sub-dataset.
https://github.com/lxc/lxd/pull/9349 https://github.com/lxc/lxd/pull/9353
This seems like a ZFS bug somehow, but hopefully these changes will work around it.
I am using 4.20 and i still encourted this problem Only way to solve is to reboot the entiere system. I am using snap build of lxd.
can we reopen this?
Can you try:
zfs list -o name,mountpoint,canmount,mounted
If the top-level dataset in the lxd pool or any of the individual container datasets have a mountpoint set, then unset it like this:
zfs set mountpoint=none,canmount=noauto foo/bar
See this discussion.
default none on no
default/containers none on no
default/containers/CCOMMERCE1 none noauto no
so
zfs set mountpoint=none,canmount=noauto default/containers
?
Since mountpoint=none
I doubt it will make a difference, but you can try it.
If the problem remains, then I suggest you start from scratch: show exactly what you see on your system, what commands you type, what errors you see, and what prompts the error to occur (you already said that rebooting the system clears the problem)
first , i was restoring a container from snapshot
lxd was updated from snap
lxc restore magento-dev clean
then it couldn't start
lxc start magento-dev
givets that error.
@v3ss0n shall we pick a single place to discuss this rather than spread it over two threads?
Would you like to proceed here or over at https://discuss.linuxcontainers.org/t/time-to-fix-this-once-and-for-all-failed-preparing-container-for-start-failed-to-run-zfs-set-mountpoint-none-canmount-noauto/12662?
yes i am there now.
So I would suggest removing any mount points on your ZFS datasets, as from https://discuss.linuxcontainers.org/t/time-to-fix-this-once-and-for-all-failed-preparing-container-for-start-failed-to-run-zfs-set-mountpoint-none-canmount-noauto/12662/4?u=tomp I can see some of your datasets have mount points which may be causing ZFS issues.
A dataset is the value in the "NAME" column from zfs list
output.
Related issue: https://github.com/lxc/lxd-pkg-snap/issues/61
(Seems to be triggered by updates of lxd snap and/or core snap)
seems so , mine was updated a week ago i think. so should create a script to snappshot and restore a few hundred times and run it overnight . May be i will try this weekend.
Same problem happend in lxd 4.22 with zfs, I can't find the reason why it happened:
root@rainyun:~# lxc start bt611252
Error: Failed to run: zfs set mountpoint=none canmount=noauto data/containers/bt611252: umount: /var/snap/lxd/common/shmounts/storage-pools/data/containers/bt611252: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/data/containers/bt611252': umount failed
Try lxc info --show-log bt611252
for more info
Same here, lxc 4.22 via snap on ubuntu 20.04.3 LTS. After editing a profile, some containers (unrelated to this profile) can't reboot nor start.
# lxc start e0028
Error: Failed to run: zfs set mountpoint=none canmount=noauto pool1/containers/e0028: umount: /var/snap/lxd/common/shmounts/storage-pools/pool1/containers/e0028: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/pool1/containers/e0028': umount failed
Editing any profile also displays errors:
# lxc profile edit large
- Project: default, Instance: t0618: Failed to write backup file: Failed to run: zfs set mountpoint=none canmount=noauto pool1/containers/t0618: umount: /var/snap/lxd/common/shmounts/storage-pools/pool1/containers/t0618: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/pool1/containers/t0618': umount failed
8 containers using this profile are listed in the error - but 21 total use this profile, so 13 of them are not affected. Those can be stopped and restart norminally.
zfs list
MOUNTPOINT column is "none" for almost all containers, even those which behave (so far) correctly.
We tried [pool1/containers/e0028](this advice from TomP) to remove the mountpoint for containers which had a mountpoint defined, to no avail:
sudo zfs set mountpoint=none,canmount=noauto pool1/containers/container_with_mountpoint
Right now we have tens of production containers hit. Restarting the host is not an option, as it would stop containers which are still working.
We would really appreciate if you could provide us with commands or a script to "repair" stuck containers - and fix the root cause of the problem if you can, of course...
@tomponline Tom, should we open an other issue, or is it OK if we continue on this one (as @stgraber wrote last year above)?
I've reopened this one, although I don't know there is much we can do until the snapd or zfs bug (which ever is causing it) is resolved.
What we are really missing is a reliable reproducer.
Closing as it's not a LXD issue, we have a packaging bug open to track this instead. https://github.com/lxc/lxd-pkg-snap/issues/61
Some more context: it seems to be related to snap upgrading lxd, which triggers Feb 01 17:50:20 h-h04 lxd.daemon[1645395]: Failed to mount new mntns: Invalid argument
journalctl -u snap.lxd.daemon -S 2022-02-01
(...)
Feb 01 17:50:05 h-h04 systemd[1]: Stopping Service for snap application lxd.daemon...
Feb 01 17:50:05 h-h04 lxd.daemon[1644835]: => Stop reason is: snap refresh
Feb 01 17:50:05 h-h04 lxd.daemon[1644835]: => Stopping LXD
Feb 01 17:50:07 h-h04 lxd.daemon[3927214]: => LXD exited cleanly
Feb 01 17:50:07 h-h04 lxd.daemon[1644835]: ==> Stopped LXD
Feb 01 17:50:07 h-h04 systemd[1]: snap.lxd.daemon.service: Succeeded.
Feb 01 17:50:07 h-h04 systemd[1]: Stopped Service for snap application lxd.daemon.
Feb 01 17:50:20 h-h04 systemd[1]: Started Service for snap application lxd.daemon.
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: => Preparing the system (22306)
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Loading snap configuration
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up mntns symlink (mnt:[4026536416])
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up persistent shmounts path
Feb 01 17:50:20 h-h04 lxd.daemon[1645395]: Failed to mount new mntns: Invalid argument
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Failed to setup shmounts, continuing without
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Making LXD shmounts use the persistent path
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Making LXCFS use the persistent path
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up kmod wrapper
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing /boot
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing a clean copy of /run
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing /run/bin
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing a clean copy of /etc
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Preparing a clean copy of /usr/share/misc
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up ceph configuration
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up LVM configuration
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up OVN configuration
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Rotating logs
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up ZFS (0.8)
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Escaping the systemd cgroups
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ====> Detected cgroup V1
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Escaping the systemd process resource limits
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Disabling shiftfs on this kernel (auto)
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: => Starting LXCFS
(...)
We had the same problem on some previous refreshes, which triggered the same damn error (but we didn't open a ticket at that time):
xyz@h-h04:/var/log# journalctl -u snap.lxd.daemon | grep "Failed to setup shmounts"
Jun 17 02:52:00 h-h04 lxd.daemon[4075130]: ====> Failed to setup shmounts, continuing without
Aug 10 02:42:11 h-h04 lxd.daemon[3755743]: ====> Failed to setup shmounts, continuing without
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Failed to setup shmounts, continuing without
A previous refresh triggered no error on Setting up mntns symlink
:
Feb 03 02:05:05 h-h04 systemd[1]: Stopping Service for snap application lxd.daemon...
Feb 03 02:05:05 h-h04 lxd.daemon[2655427]: => Stop reason is: snap refresh
Feb 03 02:05:05 h-h04 lxd.daemon[2655427]: => Stopping LXD
Feb 03 02:05:07 h-h04 lxd.daemon[1645329]: => LXD exited cleanly
Feb 03 02:05:07 h-h04 lxd.daemon[2655427]: ==> Stopped LXD
Feb 03 02:05:07 h-h04 systemd[1]: snap.lxd.daemon.service: Succeeded.
Feb 03 02:05:07 h-h04 systemd[1]: Stopped Service for snap application lxd.daemon.
Feb 03 02:05:17 h-h04 systemd[1]: Started Service for snap application lxd.daemon.
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: => Preparing the system (22340)
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Loading snap configuration
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Setting up mntns symlink (mnt:[4026536416])
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Setting up kmod wrapper
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing /boot
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing a clean copy of /run
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing /run/bin
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing a clean copy of /etc
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Preparing a clean copy of /usr/share/misc
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up ceph configuration
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up LVM configuration
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up OVN configuration
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Rotating logs
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up ZFS (0.8)
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Escaping the systemd cgroups
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ====> Detected cgroup V1
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Escaping the systemd process resource limits
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Disabling shiftfs on this kernel (auto)
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: => Re-using existing LXCFS
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: => Starting LXD
We're not going to reboot the server soon, so we'll be happy to provide logs if it can help.
@tomponline, is there any known solution to repair "broken" containers? Is zfs set mountpoint=none canmount=noauto pool1/containers/container_with_mountpoint
supposed to do any good?
Yes it seems to occur on snap refresh, but not clear whats causing it.
I'd recommend posting that comment in the other issue. There are some ugly fixes to restore some functionality but the only fix which will fix it all is a reboot.
We really desperately need a reproducer, so clear set of steps which cause the problem to show up. Once we have that, it should be just a few hours/days for us to update the logic to fix whatever is wrong, but so far, we've never had anything other than "it happens after a few months" and looking at the damage doesn't help us find the cause...
My best guess currently is that it's caused by a particular sequence of core20 and lxd snap refreshes. LXD refreshes on their own never cause this, but the fact that it only hits those who don't regularly update+reboot their systems makes me think it's got to do with core20 itself updating potentially 2-3 times before that screws things up on the next lxd refresh, but that's been so far impossible to confirm.
@stgraber do you consider this https://github.com/lxc/lxd-pkg-snap/issues/61#issuecomment-674092760 an "ugly fix"? It worked well on our server, fixed all containers, and avoided a reboot... so we were thinking about defining it as our official life-saver, and even automating it if umount pops in the logs... Rebooting is a much worse "fix"!
I consider it to be an ugly fix because as a result of doing that you won't be able to pass in new devices into those containers until they're restarted and things like file transfers may also be affected in some cases.
Required information
Ubuntu
18.04
snapd
lxc info
":Issue description
This is the same issue as on the LXD forums here & this githib issue concerning
zfs
not being namespace aware.hopefully this info helps some others:
Opening this issue to note that this problem on
zfs
seems to NOT occur if you restart a container from a root console inside the container after adding a bind mountSteps to reproduce
After adding a 3rd bind mount to a container with:
lxc config device add continer-name share-name disk source=/zpool/some/dataset path=/home/username
& then rebooting the container with
lxc restart
the container fails to restart:Error: Common start logic: Failed to run: zfs mount zpool/lxd/containers/container-name: cannot mount 'zpool/lxd/containers/container-name': filesystem already mounted
after adding the 2 previous bind mounts I rebooted the container from inside the container with a
reboot
asroot
(which worked ok).Information to attach
dmesg
) - nothing meaningfullxc info NAME --show-log
) - empty loglxc config show NAME --expanded
)lxc monitor
while reproducing the issue)