Closed igmogo-ku closed 5 months ago
Originally in commit 4497ac8e6af0ac7bf0cc7f87be7744258a90f131 the intent was to skip tracefs mount on top of debugfs mount, because on restore this tracefs was mounted automatically and if criu mounts it there explicitly too we have one excess tracefs mount appearing after each c/r.
Actually on my Fedora I have both "nested" tracefs and separately mounted tracefs:
cat /proc/self/mountinfo | grep "tracefs\|debugfs"
37 24 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:18 - debugfs debugfs rw
38 24 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:19 - tracefs tracefs rw
802 37 0:12 / /sys/kernel/debug/tracing rw,nosuid,nodev,noexec,relatime shared:610 - tracefs tracefs rw
The code does not differentiate between those, that is a first problem with the code.
Second problem with the code is that it leads to tracefs mount not visible in mount tree that's why files on this mount can't be handled and lead to error. Proper solution probably is: instead of skipping this mount on dump, to skip restoring it explicitly in case it is on top of debugfs.
Third problem I can see with all of this is that both tracefs and debugfs does not seem to be virtualized (correct me if I'm wrong), they belong to the host. Thus If CRIU migrates open file on tracefs/debugfs to another host this file may become meaningless due to different tracefs setup, or even lead to something completely unexpected.
So I would rather eliminate debugfs and tracefs from the container you are migrating and also don't migrate apps which use tracefs and debugfs because this can lead to inconsistent setups.
Hi @Snorch,
First of all, thank you very much for taking the time to write such a detailed response.
What I am trying to do is to dump a Podman container to restore it on a later moment (but in the same machine). This means, there is no risk that the tracing or debug filesystems are not present when restoring. The host runs Debian and the Podman image is Debian as well. On the host, debugfs is also mounted twice.
34 24 0:11 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:14 - tracefs tracefs rw
35 24 0:7 / /sys/kernel/debug rw,relatime shared:15 - debugfs none rw
288 35 0:11 / /sys/kernel/debug/tracing rw,relatime shared:162 - tracefs tracefs rw
The content and state of the opened tracing and debug fs files is not important after restoring the container for my application.
I wrote a small test application to check what happens on dump/restore with different types of files open. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define EXIT_IF_NOT_OPEN(pidFile) \
do \
{ \
if (NULL == (pidFile)) \
{ \
perror("Error opening" #pidFile); \
exit(EXIT_FAILURE); \
} \
} while (0)
#define PID_FILE_PATH "/tmp/file-opener.pid"
#define NORMAL_FILE_PATH "/tmp/normalFile"
#define DEBUF_FS_FILE_PATH "/sys/kernel/debug/memblock/memory"
#define TRACE_FS_FILE_PATH "/sys/kernel/tracing/enabled_functions"
int main(int, char **)
{
const int pid = getpid();
FILE *pidFile = fopen(PID_FILE_PATH, "w");
EXIT_IF_NOT_OPEN(pidFile);
fprintf(pidFile, "%d\n", pid);
fclose(pidFile);
FILE *normalFile = fopen(NORMAL_FILE_PATH, "w");
EXIT_IF_NOT_OPEN(normalFile);
FILE *debugFsFile = fopen(DEBUF_FS_FILE_PATH, "r");
EXIT_IF_NOT_OPEN(debugFsFile);
FILE *traceFsFile = fopen(TRACE_FS_FILE_PATH, "r");
EXIT_IF_NOT_OPEN(traceFsFile);
int i = 0;
for (i;; ++i)
{
printf("PID: %d, count:%d\n", pid, i);
fflush(stdout);
sleep(1);
}
}
If I start a container running that application:
sudo podman run \
--detach \
--network=host \
--mount "type=bind,source=/tmp/file-opener,target=/home/root" \
--mount "type=bind,source=/sys/kernel/tracing,target=/sys/kernel/tracing" \
--mount "type=bind,source=/sys/kernel/debug,target=/sys/kernel/debug" \
--name tc \
docker.io/arm64v8/debian:latest \
/home/root/file-opener
Dump and restore work perfectly if I modify tracefs_parse
(criu/filesystems.c:576
) to always return 0
.
sudo podman container checkpoint -l -k
e8a9e19d9c21a9c17a04752d3b95751f1b925c7055a3e4
sudo podman container restore -l -k
e8a9e19d9c21a9c17a04752d3b95751f1b925c7055a3e4
restore.log says:
(00.004303) mnt: Read 488 mp @ /sys/kernel/tracing
(00.004322) mnt: Will mount 487 from /
(00.004340) mnt: Will mount 487 @ /tmp/.criu.mntns.gSkbyZ/mnt-0000000487 /sys/kernel/debug/tracing
(00.004357) mnt: Read 487 mp @ /sys/kernel/debug/tracing
(00.004378) mnt: Will mount 486 from /sys/kernel/debug (E)
(00.004396) mnt: Will mount 486 @ /tmp/.criu.mntns.gSkbyZ/mnt-0000000486 /sys/kernel/debug
(00.004411) mnt: Read 486 mp @ /sys/kernel/debug
(00.004433) mnt: Will mount 485 from /var/run/containers/storage/overlay-containers/e8a9e19d9c21a9c17a04752d3b95751f1b925c7055a3e4
Your change is basically
``` [root@turmoil criu]# git diff diff --git a/criu/filesystems.c b/criu/filesystems.c index 093e1c492..433394b72 100644 --- a/criu/filesystems.c +++ b/criu/filesystems.c @@ -572,11 +572,6 @@ static int debugfs_parse(struct mount_info *pm) return 0; } -static int tracefs_parse(struct mount_info *pm) -{ - return 1; -} - static bool cgroup_sb_equal(struct mount_info *a, struct mount_info *b) { if (a->private && b->private && strcmp(a->private, b->private)) @@ -744,7 +739,6 @@ static struct fstype fstypes[] = { { .name = "tracefs", .code = FSTYPE__TRACEFS, - .parse = tracefs_parse, }, { .name = "cgroup", [root@turmoil criu]# test/zdtm.py run -t zdtm/static/mnt_tracefs userns is supported Checking feature mnt_id mnt_id is supported === Run 1/1 ================ zdtm/static/mnt_tracefs ====================== Run zdtm/static/mnt_tracefs in uns ====================== Start test Running zdtm/static/mnt_tracefs.hook(--post-start) ./mnt_tracefs --pidfile=mnt_tracefs.pid --outfile=mnt_tracefs.out --dirname=mnt_tracefs.test Running zdtm/static/mnt_tracefs.hook(--pre-dump) Run criu dump Running zdtm/static/mnt_tracefs.hook(--pre-restore) Run criu restore =[log]=> dump/zdtm/static/mnt_tracefs/64/1/restore.log ------------------------ grep Error ------------------------ b'(00.004337) 1: No ipcns-sem-11.img image' b'(00.005344) 1: net: Try to restore a link 10:1:lo' b'(00.005359) 1: net: Restoring link lo type 1' b'(00.005846) 1: net: \tRunning ip addr restore' b'Error: ipv4: Address already assigned.' b'Error: ipv6: address already assigned.' b'(00.028274) 1: mnt: \tBind /sys/kernel/debug/ to /tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test' b'(00.028294) 1: mnt: 1491:/tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test private 0 shared 0 slave 1' b'(00.028301) 1: mnt: \tMounting tracefs 1492@/tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test/tracing (0)' b'(00.028303) 1: mnt: \tBind /sys/kernel/debug/tracing/ to /tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test/tracing' b"(00.028316) 1: Error (criu/mount.c:2507): mnt: Can't bind-mount at /tmp/.criu.mntns.xt9rIN/14-0000000000/zdtm/static/mnt_tracefs.test/tracing: Permission denied" b'(00.029233) uns: calling exit_usernsd (-1, 1)' b'(00.029410) uns: daemon calls 0x478080 (89, -1, 1)' b'(00.029420) uns: `- daemon exits w/ 0' b'(00.029959) uns: daemon stopped' b'(00.029972) Error (criu/cr-restore.c:2571): Restoring FAILED.' ------------------------ ERROR OVER ------------------------ ############## Test zdtm/static/mnt_tracefs FAIL at CRIU restore ############### Test output: ================================ <<< ================================ Running zdtm/static/mnt_tracefs.hook(--clean) ##################################### FAIL ##################################### ```
In your case this change helps, and with external master mount tracefs it breaks things. I don't see a general solution...
Due to problem (3), I mentioned in my previous message, I believe it is best to avoid having tracefs and debugfs in container.
Hi, thanks for the info. Then I will close this issue.
Description
I would like to dump a process that opened a file in
tracefs
, but that does not work.Steps to reproduce the issue:
tracefs
Describe the results you received:
The
parse_mountinfo
function invokestracefs_parse
atcriu/proc_parse.c:1634
, which invariably returns1
as seen atcriu/filesystems.c:574
. Consequently, thetracefs
filesystem fails to be included in the list atcriu/proc_parse.c:1594
. This leads to the subsequent failure ofcriu/files-reg.c:1708
for the file intracefs
.Describe the results you expected:
Since files opened in
tracefs
do not need to be dumped or restored (as is the case with files indebugfs
), I would expect thattracefs_parse
simply returns0
. If I alter this manually in the code, my program can be dumped and restored normally. However, I suspect I might be overlooking something, as this is my first experience using CRIU.CRIU logs and information:
CRIU full dump/restore logs:
``` (00.027546) Dumping path for 3 fd via self 17 [/tmp/criu-test.normal-file] (00.027565) Only file size could be stored for validation for file /tmp/criu-test.normal-file (00.027616) 58361 fdinfo 4: pos: 0 flags: 400000/0 (00.027658) Dumping path for 4 fd via self 18 [/sys/kernel/debug/memblock/memory] (00.027694) Only file size could be stored for validation for file /sys/kernel/debug/memblock/memory (00.027752) 58361 fdinfo 5: pos: 0 flags: 400000/0 (00.027788) Error (criu/files-reg.c:1710): Can't lookup mount=288 for fd=5 path=/sys/kernel/debug/tracing/dynamic_events (00.027810) ---------------------------------------- (00.027943) Error (criu/cr-dump.c:1635): Dump files (pid: 58361) failed with -1 ```
Output of `criu --version`:
``` Version: 3.17.1 GitID: debian/3.17.1-2-11-g89adc6652 ```
Output of `criu check --all`:
``` Warn (criu/cr-check.c:813): Dirty tracking is OFF. Memory snapshot will not work. Warn (criu/cr-check.c:1148): Loginuid restore is OFF. Warn (criu/cr-check.c:1242): Do not have API to map vDSO - will use mremap() to restore vDSO Warn (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support Warn (criu/cr-check.c:1162): CRIU built without CONFIG_COMPAT - can't C/R compatible tasks Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. ```
Thank you :)