checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.9k stars 583 forks source link

Cannot dump process that opened file in tracefs #2396

Closed igmogo-ku closed 5 months ago

igmogo-ku commented 5 months ago

Description

I would like to dump a process that opened a file in tracefs, but that does not work.

Steps to reproduce the issue:

  1. Create a simple program that opens a file in tracefs
  2. Try to dump it

Describe the results you received:

The parse_mountinfo function invokes tracefs_parse at criu/proc_parse.c:1634, which invariably returns 1 as seen at criu/filesystems.c:574. Consequently, the tracefs filesystem fails to be included in the list at criu/proc_parse.c:1594. This leads to the subsequent failure of criu/files-reg.c:1708 for the file in tracefs.

Describe the results you expected:

Since files opened in tracefs do not need to be dumped or restored (as is the case with files in debugfs), I would expect that tracefs_parse simply returns 0. If I alter this manually in the code, my program can be dumped and restored normally. However, I suspect I might be overlooking something, as this is my first experience using CRIU.

CRIU logs and information:

CRIU full dump/restore logs:

``` (00.027546) Dumping path for 3 fd via self 17 [/tmp/criu-test.normal-file] (00.027565) Only file size could be stored for validation for file /tmp/criu-test.normal-file (00.027616) 58361 fdinfo 4: pos: 0 flags: 400000/0 (00.027658) Dumping path for 4 fd via self 18 [/sys/kernel/debug/memblock/memory] (00.027694) Only file size could be stored for validation for file /sys/kernel/debug/memblock/memory (00.027752) 58361 fdinfo 5: pos: 0 flags: 400000/0 (00.027788) Error (criu/files-reg.c:1710): Can't lookup mount=288 for fd=5 path=/sys/kernel/debug/tracing/dynamic_events (00.027810) ---------------------------------------- (00.027943) Error (criu/cr-dump.c:1635): Dump files (pid: 58361) failed with -1 ```

Output of `criu --version`:

``` Version: 3.17.1 GitID: debian/3.17.1-2-11-g89adc6652 ```

Output of `criu check --all`:

``` Warn (criu/cr-check.c:813): Dirty tracking is OFF. Memory snapshot will not work. Warn (criu/cr-check.c:1148): Loginuid restore is OFF. Warn (criu/cr-check.c:1242): Do not have API to map vDSO - will use mremap() to restore vDSO Warn (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support Warn (criu/cr-check.c:1162): CRIU built without CONFIG_COMPAT - can't C/R compatible tasks Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. ```

Thank you :)

Snorch commented 5 months ago

Originally in commit 4497ac8e6af0ac7bf0cc7f87be7744258a90f131 the intent was to skip tracefs mount on top of debugfs mount, because on restore this tracefs was mounted automatically and if criu mounts it there explicitly too we have one excess tracefs mount appearing after each c/r.

Actually on my Fedora I have both "nested" tracefs and separately mounted tracefs:

cat /proc/self/mountinfo | grep "tracefs\|debugfs"
37 24 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:18 - debugfs debugfs rw
38 24 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:19 - tracefs tracefs rw
802 37 0:12 / /sys/kernel/debug/tracing rw,nosuid,nodev,noexec,relatime shared:610 - tracefs tracefs rw

The code does not differentiate between those, that is a first problem with the code.

Second problem with the code is that it leads to tracefs mount not visible in mount tree that's why files on this mount can't be handled and lead to error. Proper solution probably is: instead of skipping this mount on dump, to skip restoring it explicitly in case it is on top of debugfs.

Third problem I can see with all of this is that both tracefs and debugfs does not seem to be virtualized (correct me if I'm wrong), they belong to the host. Thus If CRIU migrates open file on tracefs/debugfs to another host this file may become meaningless due to different tracefs setup, or even lead to something completely unexpected.

So I would rather eliminate debugfs and tracefs from the container you are migrating and also don't migrate apps which use tracefs and debugfs because this can lead to inconsistent setups.

igmogo-ku commented 5 months ago

Hi @Snorch,

First of all, thank you very much for taking the time to write such a detailed response.

What I am trying to do is to dump a Podman container to restore it on a later moment (but in the same machine). This means, there is no risk that the tracing or debug filesystems are not present when restoring. The host runs Debian and the Podman image is Debian as well. On the host, debugfs is also mounted twice.

34 24 0:11 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:14 - tracefs tracefs rw
35 24 0:7 / /sys/kernel/debug rw,relatime shared:15 - debugfs none rw
288 35 0:11 / /sys/kernel/debug/tracing rw,relatime shared:162 - tracefs tracefs rw

The content and state of the opened tracing and debug fs files is not important after restoring the container for my application.

I wrote a small test application to check what happens on dump/restore with different types of files open. Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define EXIT_IF_NOT_OPEN(pidFile)             \
    do                                        \
    {                                         \
        if (NULL == (pidFile))                \
        {                                     \
            perror("Error opening" #pidFile); \
            exit(EXIT_FAILURE);               \
        }                                     \
    } while (0)

#define PID_FILE_PATH "/tmp/file-opener.pid"
#define NORMAL_FILE_PATH "/tmp/normalFile"
#define DEBUF_FS_FILE_PATH "/sys/kernel/debug/memblock/memory"
#define TRACE_FS_FILE_PATH "/sys/kernel/tracing/enabled_functions"

int main(int, char **)
{
    const int pid = getpid();
    FILE *pidFile = fopen(PID_FILE_PATH, "w");
    EXIT_IF_NOT_OPEN(pidFile);
    fprintf(pidFile, "%d\n", pid);
    fclose(pidFile);

    FILE *normalFile = fopen(NORMAL_FILE_PATH, "w");
    EXIT_IF_NOT_OPEN(normalFile);

    FILE *debugFsFile = fopen(DEBUF_FS_FILE_PATH, "r");
    EXIT_IF_NOT_OPEN(debugFsFile);

    FILE *traceFsFile = fopen(TRACE_FS_FILE_PATH, "r");
    EXIT_IF_NOT_OPEN(traceFsFile);

    int i = 0;
    for (i;; ++i)
    {
        printf("PID: %d, count:%d\n", pid, i);
        fflush(stdout);
        sleep(1);
    }
}

If I start a container running that application:

sudo podman run \
   --detach \
   --network=host \
   --mount "type=bind,source=/tmp/file-opener,target=/home/root" \
   --mount "type=bind,source=/sys/kernel/tracing,target=/sys/kernel/tracing" \
   --mount "type=bind,source=/sys/kernel/debug,target=/sys/kernel/debug" \
   --name tc \
   docker.io/arm64v8/debian:latest \
   /home/root/file-opener

Dump and restore work perfectly if I modify tracefs_parse (criu/filesystems.c:576) to always return 0.

sudo podman container checkpoint -l -k
e8a9e19d9c21a9c17a04752d3b95751f1b925c7055a3e4
sudo podman container restore -l -k   
e8a9e19d9c21a9c17a04752d3b95751f1b925c7055a3e4

restore.log says:

(00.004303) mnt:        Read 488 mp @ /sys/kernel/tracing
(00.004322) mnt:                Will mount 487 from /
(00.004340) mnt:                Will mount 487 @ /tmp/.criu.mntns.gSkbyZ/mnt-0000000487 /sys/kernel/debug/tracing
(00.004357) mnt:        Read 487 mp @ /sys/kernel/debug/tracing
(00.004378) mnt:                Will mount 486 from /sys/kernel/debug (E)
(00.004396) mnt:                Will mount 486 @ /tmp/.criu.mntns.gSkbyZ/mnt-0000000486 /sys/kernel/debug
(00.004411) mnt:        Read 486 mp @ /sys/kernel/debug
(00.004433) mnt:                Will mount 485 from /var/run/containers/storage/overlay-containers/e8a9e19d9c21a9c17a04752d3b95751f1b925c7055a3e4
Snorch commented 5 months ago

Your change is basically

breaking mnt_tracefs zdtm test:

``` [root@turmoil criu]# git diff diff --git a/criu/filesystems.c b/criu/filesystems.c index 093e1c492..433394b72 100644 --- a/criu/filesystems.c +++ b/criu/filesystems.c @@ -572,11 +572,6 @@ static int debugfs_parse(struct mount_info *pm) return 0; } -static int tracefs_parse(struct mount_info *pm) -{ - return 1; -} - static bool cgroup_sb_equal(struct mount_info *a, struct mount_info *b) { if (a->private && b->private && strcmp(a->private, b->private)) @@ -744,7 +739,6 @@ static struct fstype fstypes[] = { { .name = "tracefs", .code = FSTYPE__TRACEFS, - .parse = tracefs_parse, }, { .name = "cgroup", [root@turmoil criu]# test/zdtm.py run -t zdtm/static/mnt_tracefs userns is supported Checking feature mnt_id mnt_id is supported === Run 1/1 ================ zdtm/static/mnt_tracefs ====================== Run zdtm/static/mnt_tracefs in uns ====================== Start test Running zdtm/static/mnt_tracefs.hook(--post-start) ./mnt_tracefs --pidfile=mnt_tracefs.pid --outfile=mnt_tracefs.out --dirname=mnt_tracefs.test Running zdtm/static/mnt_tracefs.hook(--pre-dump) Run criu dump Running zdtm/static/mnt_tracefs.hook(--pre-restore) Run criu restore =[log]=> dump/zdtm/static/mnt_tracefs/64/1/restore.log ------------------------ grep Error ------------------------ b'(00.004337) 1: No ipcns-sem-11.img image' b'(00.005344) 1: net: Try to restore a link 10:1:lo' b'(00.005359) 1: net: Restoring link lo type 1' b'(00.005846) 1: net: \tRunning ip addr restore' b'Error: ipv4: Address already assigned.' b'Error: ipv6: address already assigned.' b'(00.028274) 1: mnt: \tBind /sys/kernel/debug/ to /tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test' b'(00.028294) 1: mnt: 1491:/tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test private 0 shared 0 slave 1' b'(00.028301) 1: mnt: \tMounting tracefs 1492@/tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test/tracing (0)' b'(00.028303) 1: mnt: \tBind /sys/kernel/debug/tracing/ to /tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test/tracing' b"(00.028316) 1: Error (criu/mount.c:2507): mnt: Can't bind-mount at /tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test/tracing: Permission denied" b'(00.029233) uns: calling exit_usernsd (-1, 1)' b'(00.029410) uns: daemon calls 0x478080 (89, -1, 1)' b'(00.029420) uns: `- daemon exits w/ 0' b'(00.029959) uns: daemon stopped' b'(00.029972) Error (criu/cr-restore.c:2571): Restoring FAILED.' ------------------------ ERROR OVER ------------------------ ############## Test zdtm/static/mnt_tracefs FAIL at CRIU restore ############### Test output: ================================ <<< ================================ Running zdtm/static/mnt_tracefs.hook(--clean) ##################################### FAIL ##################################### ```

In your case this change helps, and with external master mount tracefs it breaks things. I don't see a general solution...

Due to problem (3), I mentioned in my previous message, I believe it is best to avoid having tracefs and debugfs in container.

igmogo-ku commented 5 months ago

Hi, thanks for the info. Then I will close this issue.