containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.01k stars 594 forks source link

[Rootless] Issues when using systemd-homed (`FATA[0000] operation not permitted`) #2056

Closed thomascft closed 1 year ago

thomascft commented 1 year ago

Discussed in https://github.com/containerd/nerdctl/discussions/2017

Originally posted by **thomascft** February 15, 2023 I have a few machines running Arch Linux and one of them I decided to try out systemd-homed. I followed the guide on https://rootlesscontaine.rs/ and have a working rootless containerd instance running on my non-homed laptop. During diagnosis I created a regular account on problematic system and everything worked fine. I creted a fresh homed and it showed the same symptoms. Below are some of the problematic commands and their respective debug outputs. `nerdctl run --debug-full --rm hello-world` ``` DEBU[0000] stateDir: /run/user/60121/containerd-rootless DEBU[0000] rootless parent main: executing "/usr/bin/nsenter" with [-r/ -w/ --preserve-credentials -m -n -U -t 737 -F nerdctl run --debug-full --rm hello-world] DEBU[0000] verification process skipped FATA[0000] operation not permitted ``` After that I tried running the alpine image. `nerdctl run --debug --rm -it alpine` Here's the resulting error: ``` DEBU[0000] stateDir: /run/user/60121/containerd-rootless DEBU[0000] rootless parent main: executing "/usr/bin/nsenter" with [-r/ -w/ --preserve-credentials -m -n -U -t 27415 -F nerdctl --debug-full run --rm -it alpine] DEBU[0000] verification process skipped FATA[0000] operation not permitted ``` I decided to try the archlinux image and it worked without any errors.
AkihiroSuda commented 1 year ago

Can't repro

$ systemctl is-active systemd-homed
active

$ nerdctl run --rm hello-world

Hello from Docker!
...

$ nerdctl run --rm alpine echo hi
hi
$ nerdctl info
Client:
 Namespace: default
 Debug Mode:    false

Server:
 Server Version: v1.6.19
 Storage Driver: overlayfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Log: fluentd journald json-file syslog
  Storage: btrfs native overlayfs fuse-overlayfs stargz
 Security Options:
  seccomp
   Profile: default
  cgroupns
  rootless
 Kernel Version: 6.1.11-arch1-1
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 3.83GiB
 Name: lima-archlinux
 ID: 01b12985-0c85-47ee-af6f-473dc0b92025

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

$ nerdctl version
Client:
 Version:   v1.2.1
 OS/Arch:   linux/amd64
 Git commit:    a0bbfd75ba92bcb11ac6059bf4f6f4e50c6da0b8
 buildctl:
  Version:  v0.11.3
  GitCommit:    4ddee42a32aac4cd33bf9c2be4c87c2ffd34747b

Server:
 containerd:
  Version:  v1.6.19
  GitCommit:    1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f
 runc:
  Version:  1.1.4
  GitCommit:    v1.1.4-0-g5fd4c4d1

$ systemctl --version
systemd 252 (252.5-1-arch)
+PAM +AUDIT -SELINUX -APPARMOR -IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified

$ df -T
Filesystem     Type        1K-blocks      Used  Available Use% Mounted on
dev            devtmpfs      1982892         0    1982892   0% /dev
run            tmpfs         2008204       696    2007508   1% /run
/dev/vda2      btrfs       104855532   1416944  103300592   2% /
tmpfs          tmpfs         2008204         0    2008204   0% /dev/shm
tmpfs          tmpfs         2008204         0    2008204   0% /tmp
/dev/sr0       iso9660        273676    273676          0 100% /mnt/lima-cidata
tmpfs          tmpfs          401640         4     401636   1% /run/user/501
:/Users/suda   fuse.sshfs 3908112996 711224976 3196888020  19% /Users/suda
:/tmp/lima     fuse.sshfs 3908112996 711224976 3196888020  19% /tmp/lima

( Lima v0.15.0 https://github.com/lima-vm/lima/blob/v0.15.0/examples/archlinux.yaml )

AkihiroSuda commented 1 year ago

Might be relevant to the filesystem of your home directory. nerdctl run --snapshotter=fuse-overlayfs, or --snapshotter=native may work?

thomascft commented 1 year ago

I've tried it with native snapshotting and didn't see anything different. I've also tried it with systemd-homed users that have both directory and subvolume storage. I'm using btrfs as my root filesystem if that matters. I get FATA[000*] operation not permitted when I run alpine after pruning all images with the * being replaced with a seemingly random number.

thomascft commented 1 year ago

nerdctl info

Client:
 Namespace:     default
 Debug Mode:    false

Server:
 Server Version: v1.6.18
 Storage Driver: overlayfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Log: fluentd journald json-file syslog
  Storage: btrfs native overlayfs
 Security Options:
  seccomp
   Profile: default
  cgroupns
  rootless
 Kernel Version: 6.2.1-zen1-1-zen
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 10.6GiB
 Name: acer
 ID: 71d4fe71-c5ee-44b0-8a32-13c08762eb03

WARNING: IPv4 forwarding is disabled
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

nerdctl version

WARN[0000] unable to determine buildctl version: exec: "buildctl": executable file not found in $PATH
Client:
 Version:       1.2.0
 OS/Arch:       linux/amd64
 Git commit:    5aee2f754a2f46d4c8ccbae25d56b43668ac4b62
 buildctl:
  Version:

Server:
 containerd:
  Version:      v1.6.18
  GitCommit:    2456e983eb9e37e47538f59ea18f2043c9a73640.m
 runc:
  Version:      1.1.4

systemctl --version

systemd 253 (253-1-arch)
+PAM +AUDIT -SELINUX -APPARMOR -IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified

df -T

Filesystem     Type     1K-blocks      Used Available Use% Mounted on
dev            devtmpfs   5536916         0   5536916   0% /dev
run            tmpfs      5556680      1236   5555444   1% /run
/dev/sda2      btrfs    445746176 290455608 154226216  66% /
tmpfs          tmpfs      5556680    138872   5417808   3% /dev/shm
/dev/sda2      btrfs    445746176 290455608 154226216  66% /home
/dev/sda2      btrfs    445746176 290455608 154226216  66% /swap
/dev/sda2      btrfs    445746176 290455608 154226216  66% /var/cache/pacman/pkg
/dev/sda2      btrfs    445746176 290455608 154226216  66% /var/log
tmpfs          tmpfs      5556684     18008   5538676   1% /tmp
/dev/sda1      vfat       1046508    147660    898848  15% /boot
/dev/sda2      btrfs    445746176 290455608 154226216  66% /home/thomas
tmpfs          tmpfs      1111336        20   1111316   1% /run/user/60121
AkihiroSuda commented 1 year ago

Applied the latest pacman -Syu to my machine, but still can't repro the issue. ( linux-6.2.1.arch1-1-x86_64, systemd-253-1-x86_64 )

6.2.1-zen1-1-zen

Could you try non-zen kernel?

thomascft commented 1 year ago

Just ran it again with the mainline kernel, no luck there either. It happens on my workstation and laptop so it could have to do with their homed configurations.

homectl inspect $(whoami)

   User name: thomas
       State: active
 Disposition: regular
 Last Change: Tue 2023-02-28 11:41:51 MST
 Last Passw.: Fri 2023-01-20 19:32:09 MST
    Login OK: yes
 Password OK: yes
         UID: 60121
         GID: 60121 (thomas)
 Aux. Groups: wheel
   Real Name: Thomas
   Directory: /home/thomas
     Storage: subvolume (no encryption)
  Image Path: /home/thomas.homedir
   Removable: no
       Shell: /bin/zsh
 Mount Flags: nosuid nodev exec
   Disk Size: 425.0G
   Disk Free: 145.6G (= 34.2%)
  Disk Floor: 5.0M
Disk Ceiling: 5.0T
  Good Auth.: 146
   Last Good: Tue 2023-02-28 21:24:16 MST
   Bad Auth.: 183
    Last Bad: Tue 2023-02-28 21:24:14 MST
    Next Try: anytime
 Auth. Limit: 30 attempts per 1min
   Passwords: 1
  Local Sig.: yes
     Service: io.systemd.Home
AkihiroSuda commented 1 year ago

Sorry, homed was just not working for my home dir. Locally reproduced the issue after creating another account via homectl.

The error seems coming from: https://github.com/containerd/nerdctl/blob/cc1b6e0da1c65e77f5f5f732ef34ea0c5eb3dcec/cmd/nerdctl/container_run_mount.go#L194 during mounting Mount{Type:bind Source:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2/fs Target: Options:[ro rbind]} on /tmp/initialC2048665509. (/var/lib/containerd is mapped to ~/.local/share/containerd).

AkihiroSuda commented 1 year ago

PR:

Note: in addition to this PR, the subid ranges in/etc/subuid and /etc/subgid have to begin from 524288 e.g.,

test:524288:65536

Otherwise running most images will fail with value too large for defined data type

$ ./nerdctl run -it --rm alpine:3.17.0
docker.io/library/alpine:3.17.0:                                                  resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:c0d488a800e4127c334ad20d61d7bc21b4097540327217dfab52262adc02380c: waiting        |--------------------------------------|
config-sha256:49176f190c7e9cdb51ac85ab6c6d5e4512352218190cd69b08e6fd803ffbf3da:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c158987b05517b6f2c5913f3acef1f2182a32345a304fe357e3ace5fadcad715:    downloading    |+++++++++++++++++++++++---------------|  2.0 MiB/3.2 MiB
elapsed: 10.4s                                                                    total:  2.0 Mi (197.1 KiB/s)
FATA[0010] failed to extract layer sha256:ded7a220bb058e28ee3254fbba04ca90b679070424424761a53a043b93b612bf: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount762573051:
failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount762573051/etc/shadow" for UID 0, GID 42: lchown /var/lib/containerd/tmpmounts/containerd-mount762573051/etc/shadow: value too large for defined data type: unknown
thomascft commented 1 year ago

Note: in addition to this PR, the subid ranges in/etc/subuid and /etc/subgid have to begin from 524288

I can make a pr for https://rootlesscontaine.rs/ to clarify this. Maybe add a section to docs/rootless.md or docs/FAQ.md? Edit: It looks like you beat me to it. I'll build HEAD and see how it works on my machine, Thanks for the help! Edit 2: Everything is working great! I assume I'll have to build from git until the next milestone when it gets pushed to the distro repos?