Restored processes are bind to same addr and port when restoring multiple times to different PID namespace

Tianyang-Zhang commented 1 year ago

Description

I have a simple program opens a TCP socket, bind and listen to address 0.0.0.0 and port 5050. If checkpoint it, kill the original process(if not killing original process, behavior is different, will describe below) and restoring the same snapshot multiple times to different PID namespaces, all of the restored processes are bind to address 0.0.0.0 and port 5050. But only the latest restored process can receive messages through the socket. If kill the last-restored process, the second-last-restored one will be able to receive messages.

If the original process is not killed, all restores will fail because of the address already in use error, which is expected. However, if the original process is killed, the restore will success.

I tried to trace the cause. During the restore, there is a call close_service_fd(TRANSPORT_FD_OFF) near the end of cr-restore.c::sigreturn_restore(). If one of the CRIU restore process restores the socket but then sleep before callingclose_service_fd(TRANSPORT_FD_OFF), all other restores will fail because of addr in use. After close_service_fd(TRANSPORT_FD_OFF) is called, the addr and port are somehow free to bind in another CRIU restore process(but will fail if try bind outside of CRIU).

I haven't figure out why close_service_fd(TRANSPORT_FD_OFF) related to socket bind.

Steps to reproduce the issue:

Create a program opens a TCP socket, bind and listen to any addr and a random port
Checkpoint with sudo criu dump --tree <pid> --images-dir <dir> --leave-running --shell-job --tcp-close
kill the original process
Create new PID namespaces in different consoles using: sudo unshare --pid --fork --mount-proc
Restore multiple times in different namespaces

Describe the results you received: All of the restores succeed(expect only 1 success), but only the last one can receive message through that TCP socket.

Describe the results you expected: Expect only the first restore success, and all other restores fail with address already in use error.

Additional information you deem important (e.g. issue happens only occasionally): If keep the original process running, the behavior is correct, all restores to new PID namespace fail with addr in use error.

CRIU logs and information:

CRIU full dump/restore logs:

``` [dump_log.txt](https://github.com/checkpoint-restore/criu/files/11193484/dump_log.txt) Restore log when restoring to a new PID namespace and the original process still alive, fail with `addr in use` as expected: [restore_to_PID-NS_log_when_original_proc_alive.txt](https://github.com/checkpoint-restore/criu/files/11193495/no_kill_restore_log.txt) Restore log when restore 2 times to host PID namespace and a new PID namespace, both succeed(expect 2nd fail): [kill_original_proc_and_res_to_host_PID-NS.txt](https://github.com/checkpoint-restore/criu/files/11193502/res_to_host.txt) [kill_original_proc_and_res_to_new_PID-NS.txt](https://github.com/checkpoint-restore/criu/files/11193503/res_to_new.txt) ```

Output of `criu --version`:

``` Version: 3.17.1 GitID: v3.17.1 ```

Output of `criu check --all`:

``` ./criu/criu check --all Warn (criu/cr-check.c:1231): clone3() with set_tid not supported Error (criu/cr-check.c:1273): Time namespaces are not supported Error (criu/cr-check.c:1283): IFLA_NEW_IFINDEX isn't supported Warn (criu/cr-check.c:1300): Pidfd store requires pidfd_open syscall which is not supported Warn (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support Warn (criu/cr-check.c:804): ptrace(PTRACE_GET_RSEQ_CONFIGURATION) isn't supported. C/R of processes which are using rseq() won't work. Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. ```

Additional environment details: OS:

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"

Kernel:


$ uname -r
4.18.0-240.10.1.el8_3.x86_64```

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

Tianyang-Zhang commented 1 year ago

Could anyone help with this issue?

sakgoyal commented 1 year ago

did you set the SO_REUSEADDR option when you create the socket?

Snorch commented 1 year ago

did you set the SO_REUSEADDR option when you create the socket?

When looking into code I see one problem with SO_REUSEPORT, but it does not seem to be related to this issue...

See how in post_open_inet_sk() we say "/ SO_REUSEADDR is set for all sockets /", meaning that CRIU is restoring all sockets wih SO_REUSEADDR and SO_REUSEPORT first for CRIU's needs. And only then restores them to original dumped state.

But as you also can see in post_open_inet_sk(), tcp connections are handled differently delaying options restore to prepare_tcp_socks(), where we only care about addr reuse part and not about port reuse.

But in your logs I don't see a message "pie: Turning repair off for" which would indicate that the code flow passed the above.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

checkpoint-restore / criu

Restored processes are bind to same addr and port when restoring multiple times to different PID namespace #2153