checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.76k stars 560 forks source link

Passing inherit-fd options to child processes #2363

Closed muvaf closed 3 months ago

muvaf commented 3 months ago

Description

--inherit-fd and --external options work on the file descriptors of the process whose PID we give to criu. However, there doesn't seem to be a way to pass down these options for the child processes.

I have a process tree 9289 -> 9290, 9289 -> 9291 where 9289 is the parent of both 9290 and 9291 and I use 9289 as the PID for criu dump command but both --inherit-fd and --external options accepts file descriptors for only 9289. There is no way, for example, say that I want to make 9290 inherit an fd.

Am I missing a documentation page or is it simply not possible today?

rst0git commented 3 months ago

@muvaf This functionality is intended for use with namespaces, and it is not PID-specific. For example, --external file[mnt_id:inode] is used for file descriptors that cannot be resolved from the current mount namespace (i.e., a file is identified by mnt_id and inode).

--inherit-fd 'fd[N]:path/to/file' can also be used to replace a file descriptor during restore (as described in https://criu.org/Inheriting_FDs_on_restore). In this case, path/to/file is used to identify the file descriptor that will be replaced. This works for child processes and you can use crit show files.img to display the paths saved in a checkpoint.

muvaf commented 3 months ago

Thanks @rst0git , that cleared up some of my misunderstanding. Looks like my problem was that the file I'm trying to replace file descriptor of is read-only and it looks like we get bad file descriptor in such cases (i.e. criu restore --inherit-fd 'fd[7]:sys/fs/cgroup/somepath/memory.current' 7>/sys/fs/cgroup/memory.current). Running in Bash fails immediately:

$ criu restore 7>/sys/fs/cgroup/memory.current
bash: /sys/fs/cgroup/memory.current: Read-only file system

I'll look into whether I can change the file path in the files.img file but I'm doubtful that'd work well. Maybe my wrapper process calling criu can deliver the open fd in another way? Wonder if it can access the fds of the caller process if I use swrk mode.

adrianreber commented 3 months ago

As mentioned in another ticket, CRIU can ignore any cgroup settings if the ignore mode is specified. Not sure that that helps in your case because I don't know why and how your application uses the cgroup.

muvaf commented 3 months ago

@rst0git I ended up handling it in my wrapper program by opening the additional files as read-only and making the child process inherit them via ExtraFiles property.

@adrianreber My application (webkit browser) uses cgroup folder to get the current/max/min memory numbers. However, the exact path includes kubernetes pod name and containerd container ID, i.e. /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod00c52549_27ae_4b84_a096_3077cc7f493c.slice/cri-containerd-e2e9b2135d7df413add6d27d9da8f80490fde6250a16765eca5d6c869cb23f6e.scope/memory.current, and that changes when I restore it in another pod. So it's not really about cgroups specifically, but more about making sure open fd still works in restored case. I ended up calculating the path in the new pod via reading the value in /proc/self/cgroup. It all worked in the end and I'm able to restore browser+VNC server.

The issue is about child processes inheriting fds and it's already possible, so I'll close the issue. Thanks guys!