containers / fuse-overlayfs

FUSE implementation for overlayfs
GNU General Public License v2.0
502 stars 83 forks source link

`tar: .: file changed as we read it` with `fuse-overlayfs` #384

Open Hi-Angel opened 1 year ago

Hi-Angel commented 1 year ago

/kind bug

Description

I've recently migrated CI to use fuse-overlayfs due to an issue with with native overlay, and after some time it started sporadically failing with error in title. I am not completely sure fuse-overlayfs is the cause, because of an unrelated podman bug which I'll report separately, but that was the only change, so it likely is.

Problem comes down to this:

 λ podman run --rm ubuntu:22.04 sh -c "tar -cf /tmp/flash.tar --one-file-system --exclude=./* -C / ."
tar: .: file changed as we read it

To dispel any doubts the tar command is valid: you can test it on a host system, it will succeed. The tar is being asked to enter / dir with -C /, then to save everything from the current dir that is not --excludeed into /tmp/flash.tar. However, for the purposes of a "minimal testcase" the --exclude command excludes everything, so /tmp/flash.tar will only contain an empty ./ dir.

The tar error comes down to mtime of a / being changed while it's running. Why does that happen, I haven't yet found. I did find though that various odd combinations make that not reproduce. Some examples in the workarounds section below.

The error happens on Ubuntu and Arch and with different containers. I'm reporting Arch as it's my working system and is easily accessible.

Known Workarounds

1.

     λ podman run --rm ubuntu:22.04 sh -c "tar -cf /tmp/flash.tar --one-file-system --exclude=./* -C / .; tar -cf /tmp/flash.tar --one-file-system --exclude=./* -C / ."
    tar: .: file changed as we read it
Here you can see the `tar` command is being called twice, but the error only happens the first time.

2.

    λ podman run --rm ubuntu:22.04 sh -c "ls / >/dev/null && tar -cf /tmp/flash.tar --one-file-system --exclude=./* -C / ."

No error. The only thing changed is we call a ls /, which alone fixes the problem. 3.

    λ podman run --rm -v /tmp/:/tmp ubuntu:22.04 sh -c "touch / && tar -cf /tmp/flash.tar --one-file-system --exclude=./* -C / ."

No error. There are 2 changes:

Steps to reproduce the issue (in terms of terminal commands):

 λ podman run --rm ubuntu:22.04 sh -c "tar -cf /tmp/flash.tar --one-file-system --exclude=./* -C / ."
tar: .: file changed as we read it

Describe the results you received:

An error: tar: .: file changed as we read it

Describe the results you expected:

No output and zero exit code.

Output of podman version:

 λ podman version
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.3
Git Commit:   814b7b003cc630bf6ab188274706c383f9fb9915-dirty
Built:        Mon Nov 21 01:32:45 2022
OS/Arch:      linux/amd64

Output of podman info:

Details ``` λ podman version Client: Podman Engine Version: 4.3.1 API Version: 4.3.1 Go Version: go1.19.3 Git Commit: 814b7b003cc630bf6ab188274706c383f9fb9915-dirty Built: Mon Nov 21 01:32:45 2022 OS/Arch: linux/amd64 [20.12.2022-17:28:29] constantine@constantine-N61Ja /tmp ‹node-› ‹› λ podman info host: arch: amd64 buildahVersion: 1.28.0 cgroupControllers: - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: /usr/bin/conmon is owned by conmon 1:2.1.5-1 path: /usr/bin/conmon version: 'conmon version 2.1.5, commit: c9f7f19eb82d5b8151fc3ba7fbbccf03fdcd0325' cpuUtilization: idlePercent: 62.71 systemPercent: 9.31 userPercent: 27.98 cpus: 4 distribution: distribution: arch version: unknown eventLogger: journald hostname: constantine-N61Ja idMappings: gidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 165536 size: 65536 uidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 165536 size: 65536 kernel: 6.0.11-zen1-1-zen linkmode: dynamic logDriver: journald memFree: 893280256 memTotal: 8225759232 networkBackend: cni ociRuntime: name: crun package: /usr/bin/crun is owned by crun 1.7.2-1 path: /usr/bin/crun version: |- crun version 1.7.2 commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4 rundir: /run/user/1000/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux remoteSocket: path: /run/user/1000/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /etc/containers/seccomp.json selinuxEnabled: false serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1 version: |- slirp4netns version 1.2.0 commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383 libslirp: 4.7.0 SLIRP_CONFIG_VERSION_MAX: 4 libseccomp: 2.5.4 swapFree: 3161047040 swapTotal: 7918841856 uptime: 272h 34m 42.00s (Approximately 11.33 days) plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan - ipvlan volume: - local registries: search: - docker.io - registry.fedoraproject.org - quay.io - registry.access.redhat.com - registry.centos.org store: configFile: /home/constantine/.config/containers/storage.conf containerStore: number: 9 paused: 0 running: 2 stopped: 7 graphDriverName: overlay graphOptions: overlay.mount_program: Executable: /usr/bin/fuse-overlayfs Package: /usr/bin/fuse-overlayfs is owned by fuse-overlayfs 1.10-1 Version: |- fusermount3 version: 3.12.0 fuse-overlayfs: version 1.10 FUSE library version 3.12.0 using FUSE kernel interface version 7.31 graphRoot: /home/constantine/.local/share/containers/storage graphRootAllocated: 991614205952 graphRootUsed: 726417346560 graphStatus: Backing Filesystem: btrfs Native Overlay Diff: "false" Supports d_type: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 30 runRoot: /run/user/1000/containers volumePath: /home/constantine/.local/share/containers/storage/volumes version: APIVersion: 4.3.1 Built: 1668983565 BuiltTime: Mon Nov 21 01:32:45 2022 GitCommit: 814b7b003cc630bf6ab188274706c383f9fb9915-dirty GoVersion: go1.19.3 Os: linux OsArch: linux/amd64 Version: 4.3.1 ```

Package info:

 λ pacman -Qi podman
Name            : podman
Version         : 4.3.1-2
Description     : Tool and library for running OCI-based containers in pods
Architecture    : x86_64
URL             : https://github.com/containers/podman
Licenses        : Apache
Groups          : None
Provides        : None
Depends On      : catatonit  conmon  containers-common  crun  iptables  libdevmapper.so=1.02-64  libgpgme.so=11-64  libseccomp.so=2-64  slirp4netns
Optional Deps   : apparmor: for AppArmor support
                  btrfs-progs: support btrfs backend devices [installed]
                  cni-plugins: for an alternative container-network-stack implementation [installed]
                  podman-compose: for docker-compose compatibility
                  podman-docker: for Docker-compatible CLI
Required By     : None
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 66.95 MiB
Packager        : David Runge <dvzrv@archlinux.org>
Build Date      : Пн 21 ноя 2022 01:32:45
Install Date    : Ср 07 дек 2022 23:38:35
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Hi-Angel commented 1 year ago

Great news: I spent a lot of time, but did manage to reduce that down to a small binary that reproduces the problem!

Here's the new steps to reproduce:

 λ cat test.c
#define _GNU_SOURCE
#include <dirent.h>

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>
#include <string.h>

void exit_on_system_error(int fd_to_check, const char* msg) {
    if (fd_to_check != -1)
        return;
    fprintf(stderr, "%s:%d: %s: %s\n", __FILE__, __LINE__, msg, strerror(errno));
    _exit(1);
}

int main() {
    exit_on_system_error(creat("/tmp/myfile", 0666),
                         "creat");

    int fd_root_dir = openat(AT_FDCWD, "/", O_RDONLY);
    exit_on_system_error(fd_root_dir, "openat");

    char buf[4096];
    exit_on_system_error(getdents64(fd_root_dir, buf, sizeof(buf)),
                         "getdents64");

    struct stat stat_arg;
    exit_on_system_error(fstatat(AT_FDCWD, "/", &stat_arg, 0),
                         "newfstatat");
}
 λ gcc test.c -o a
 λ podman run --rm -v "/tmp:/mnt" ubuntu:22.04 sh -c "stat -c %y / && /mnt/a && stat -c %y /"
2022-12-21 15:14:31.187780859 +0000
2022-12-21 15:14:31.365780865 +0000

Basically, it happens when creat creates a file somewhere, then openat opens /, then getdents64 is used on it, then fstatat is being called on /.

Removing any of the 4 calls will make the problem no longer reproduce.

giuseppe commented 1 year ago

could you strace the fuse-overlayfs process while you run that program?

Hi-Angel commented 1 year ago

could you strace the fuse-overlayfs process while you run that program?

Sure! Here it is: strace_fuse-overlayfs.txt


To do the strace I used this script:

#!/bin/bash
set -e
strace -v -s 512 2>/tmp/output /usr/bin/fuse-overlayfs "$@"
Hi-Angel commented 1 year ago

while you run that program?

Worth noting though, apparently fuse-overlayfs does not run at the same time the program does. I added a 7 seconds sleep() to the program, but that has no influence on the timestamps in strace output. I think fuse-overlayfs only runs at the beginning of a container launch, and quits before the container command is being run.

giuseppe commented 1 year ago

I think fuse-overlayfs only runs at the beginning of a container launch, and quits before the container command is being run.

it forks itself. You'd need to use the -f option to strace so it follows children processes.

Hi-Angel commented 1 year ago

I think fuse-overlayfs only runs at the beginning of a container launch, and quits before the container command is being run.

it forks itself. You'd need to use the -f option to strace so it follows children processes.

Thanks! Adding a -f option makes it wait indefinitely on some read(7, call. The 7 descriptor is a openat(AT_FDCWD, "/dev/fuse", O_RDWR|O_CLOEXEC) = 7. I see one successful read from it, then one write, and then goes the read(7,. I waited for a few minutes, it just does nothing.

Removing the -f makes the testcase complete immediately.

Hi-Angel commented 1 year ago

Anything else I can provide here?

giuseppe commented 1 year ago

fuse-overlayfs is mostly in maintenance mode at this point since there is a native file system in the kernel that replaces it, so it is unlikely I am going to spend much time troubleshooting fuse-overlayfs unless it is a major issue. So the easiest would be to provide a patch.

Said so, I can give some guidance if needed. Could you please attach the output you get from strace when you add -f?

Hi-Angel commented 1 year ago

Said so, I can give some guidance if needed. Could you please attach the output you get from strace when you add -f?

Sure, here goes: strace_fuse-overlayfs-w-f.txt

lignumqt commented 1 year ago

@giuseppe hello, so, any news, is there anything interesting in the logs?

Hi-Angel commented 1 year ago

there is a native file system in the kernel that replaces it

FTR, it doesn't really. I would love to migrate to the native overlay, however due to https://github.com/containers/podman/issues/16541 it is not currently feasible, and from my understanding of the discussion on that bugreport, there will be no solution any time soon. So it seems like fixing fuse-overlayfs is more productive at this point.

giuseppe commented 1 year ago

@giuseppe hello, so, any news, is there anything interesting in the logs?

I've no time to look into this issue at the moment, if anyone is willing to troubleshoot and possibly open a PR, I'll help to get it merged

Hi-Angel commented 9 months ago

Okay, so, I'm trying to debug the fuse-overlayfs hang under strace. I see in bpftrace that the exact args Podman runs fuse-overlayfs with are:

/usr/bin/fuse-overlayfs -o lowerdir=/home/constantine/.local/share/containers/storage/overlay/l/S4FZLHMYW6WDUORNLKFHZIXOZG,upperdir=/home/constantine/.local/share/containers/storage/overlay/369531141b1ac0c2fe3fc26a5732df66ac56221200c25cb4d456545e530a8b26/diff,workdir=/home/constantine/.local/share/containers/storage/overlay/369531141b1ac0c2fe3fc26a5732df66ac56221200c25cb4d456545e530a8b26/work,,volatile /home/constantine/.local/share/containers/storage/overlay/369531141b1ac0c2fe3fc26a5732df66ac56221200c25cb4d456545e530a8b26/merged

However, executing same command in terminal results in unknown argument ignored:. Is that expected?

Hi-Angel commented 9 months ago

However, executing same command in terminal results in unknown argument ignored:. Is that expected?

Apparently yes, it is. fuse-overlayfs by default even prints two messages:

unknown argument ignored: lazytime
unknown argument ignored: 

one of them I see mentioned on other podman issues. Unclear what causes them since e.g. lazytime isn't among options passed.