canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.37k stars 930 forks source link

Inconsistent container behavior with `shiftfs` #8490

Closed techtonik closed 2 years ago

techtonik commented 3 years ago

Required information

Issue description

There is an inconsistent behavior in LXD launching two containers - repodraw and u3. I suppose operations below should be idempotent and depend on container name, but they don't. Graphviz in the first container fails, and works as expected in the second.

$ lxc rm -f repodraw && lxc launch images:ubuntu/20.10 repodraw && lxc exec repodraw -- apt-get -y -qq install graphviz && lxc exec repodraw -- dot -v
...
dot - graphviz version 2.43.0 (0)
There is no layout engine support for "dot"
Perhaps "dot -c" needs to be run (with installer's privileges) to register the plugins?
$ lxc rm -f u3 && lxc launch images:ubuntu/20.10 u3 && lxc exec u3 -- apt-get -y -qq install graphviz && lxc exec u3 -- dot -v
...
dot - graphviz version 2.43.0 (0)
libdir = "/usr/lib/x86_64-linux-gnu/graphviz"
Activated plugin library: libgvplugin_dot_layout.so.6
Using layout: dot:dot_layout
...

I guess this was caused by setting shift=true at some point - https://discuss.linuxcontainers.org/t/secure-and-user-friendly-mounts-for-unprivileged/10284/3?u=techtonik - but lxc rm should clean it up, right?

Information to attach

lxc repodraw 20210220003526.613 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1129 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.repodraw" lxc repodraw 20210220003526.615 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1129 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.repodraw" lxc repodraw 20210220003526.623 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1550 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )

Log:

lxc u3 20210220002258.141 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1129 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.u3" lxc u3 20210220002258.143 WARN cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1129 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.u3" lxc u3 20210220002258.149 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1550 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )



 - [ ] Container configuration (`lxc config show NAME --expanded`)
 - [ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
 - [ ] Output of the client with --debug
 - [ ] Output of the daemon with --debug (alternatively output of `lxc monitor` while reproducing the issue)
stgraber commented 3 years ago

I'm unable to reproduce your issue, all containers behave like your second one here regardless of name. This is extremely unlikely to be a LXD issue and I don't see the relation with shift=true given the two containers you're creating here do not have any such attached device in the first place.

techtonik commented 3 years ago

So how to debug what's going on?

techtonik commented 3 years ago

I compared lxc monitor logs, and there are no differences, except minor message order.

techtonik commented 3 years ago

I run strace dot -v inside container, and could not make sense of it. Probably the error occurs during installation of the packages.

Restarting daemon didn't help. Only after I disabled shiftfs and restared lxd daemon, the error is gone.

sudo snap set lxd shiftfs.enable=false
sudo systemctl reload snap.lxd.daemon

Most likely filesystem driver remembered that I attached lxc config device add "$NAME" "$NAME-shared" disk source="$PWD" path="/root/$NAME" shift=true at some point.

techtonik commented 3 years ago

When I reenable shiftfs and reload daemon, the dot -v start to fail again. That only repeat for this specific container. It doesn't repeat with other container.

The command I use for testing:

NAME=repodraw; (lxc rm -f $NAME && lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)

That doesn't make sense. Unless there is some state preserved in filesystem driver related to this container.

techtonik commented 3 years ago

It is pretty insane, but the bug with enabled shiftfs depends on the length of container name. The bug only manifests itself if the name is 7-8 symbols.

These work ok:

NAME=ii; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)
NAME=i23; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)
NAME=i23456; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)

These fail:

NAME=i234567; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)
NAME=i2345678; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)
NAME=iiiiiiii; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)

But these again work ok:

NAME=i23456789; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)
NAME=i234567890; (lxc launch images:ubuntu/20.10 $NAME && lxc exec $NAME -- apt-get -y -qq install graphviz && lxc exec $NAME -- dot -v)

That's repeatable.

techtonik commented 3 years ago

It also seems that with enabled shiftfs guest doesn't immediately detect changes on host filesystem.

brauner commented 3 years ago

It also seems that with enabled shiftfs guest doesn't immediately detect changes on host filesystem.

If you change the filesystem directly from the host than shiftfs can't guarantee in all scenarios that the updates are picked up immediately; it'll basically have to keep two caches in sync. If things don't go terribly wrong after this merge window then shiftfs will slowly be faded out in favor of an upstream solution we wrote that has been merged.

techtonik commented 3 years ago

@brauner thanks for the clarification. It is unfortunate the shiftfs didn't work for me. For my development purposes I don't really need performance, and secure mount of a network filesystem without caching would solve all my problems. I successfully did that with 9p2000 filesystem, but it still requires installed FUSE driver inside guest. If LXD could extrapolate 9p filesystem support from VMs to containers, then I would not need shiftfs.

stgraber commented 3 years ago

Well, 9pfs in VMs is also terrible for performance, that's why the world is switching to virtiofs instead which is significantly faster but also even more tied to how VMs work.

@brauner I still owe you a review on the liblxc side for the shifted mounts but were you planning on also adding direct LXD support or are we going to have to wait for a new LXC release that brings in that support before we can use it?

brauner commented 3 years ago

Well, 9pfs in VMs is also terrible for performance, that's why the world is switching to virtiofs instead which is significantly faster but also even more tied to how VMs work.

@brauner I still owe you a review on the liblxc side for the shifted mounts but were you planning on also adding direct LXD support or are we going to have to wait for a new LXC release that brings in that support before we can use it?

Direct LXD support as in LXD setting up idmapped mounts for hotplugging into containers? This obviously will not work for virtiofs right now but the virtiofs developers want this to be a thing.

stgraber commented 3 years ago

@brauner direct LXD support as in detecting availability of the new kernel feature on startup and using it everywhere we use shiftfs today.

brauner commented 3 years ago

Oh, yeah. I think that should be mostly doable it would need to be in the generic part of the storage code with a test whether the filesystem supports that feature.

techtonik commented 3 years ago

There is still no explanations of bug with container name.

stgraber commented 2 years ago

Closing as this looks like a shiftfs issue and shiftfs isn't seeing development at this point.

We're pushing for as many filesystems to move over to idmapped mounts as possible with currently ext4, xfs, vfat, f2fs and btrfs all supporting it. ZFS is tracked at https://github.com/openzfs/zfs/issues/12923 and we have an experimental patchset for cephfs as well as ongoing work on overlayfs.

stgraber commented 2 years ago

Ubuntu 22.04 ships with both shiftfs and idmapped mounts with LXD preferring the latter whenever available and shiftfs still requiring direct enablement through snap config at which point it takes care of the few filesystems without the native idmap support.