checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.77k stars 561 forks source link

criu restore successfully, but tcp socket conntection was reset. #2316

Closed Jiehon closed 6 months ago

Jiehon commented 6 months ago

Description Two processes run on one machine. We use criu to dump process state, include tcp socket state. We restore process state without restore error. But tcp socket conntection was reset. We firstly restore one server process, but server process receives reset messsage and sock is destroyed.

<...>-19333 [016] .... 9000755.565623: tcp_reset_p: (tcp_reset+0x0/0x90) sock=0xffff88c09c8a5e80 <...>-19333 [016] .... 9000755.565623: tcp_receive_reset: sport=18234 dport=55840 saddr=xxx daddr=xxx saddrv6=::ffff:xxx daddrv6=::ffff:10.127.25.21 sock_cookie=261806 <...>-19338 [018] .... 9000755.570078: tcp_v4_destroy_sock_p: (tcp_v4_destroy_sock+0x0/0x200) sock=0xffff88c09c8a0000 <...>-19338 [018] .... 9000755.570079: tcp_destroy_sock: sport=18234 dport=55812 saddr=xxx daddr=xxx saddrv6=::ffff:xxx daddrv6=::ffff:xxx sock_cookie=261803 <...>-19338 [018] .... 9000755.570104: tcp_v4_destroy_sock_p: (tcp_v4_destroy_sock+0x0/0x200) sock=0xffff88c09c8a4980 <...>-19338 [018] .... 9000755.570104: tcp_destroy_sock: sport=18234 dport=55814 saddr=xxx daddr=xxx saddrv6=::ffff:xxx daddrv6=::ffff:xxx sock_cookie=261804 <...>-19338 [018] .... 9000755.570111: tcp_v4_destroy_sock_p: (tcp_v4_destroy_sock+0x0/0x200) sock=0xffff88c09c8a6900 <...>-19338 [018] .... 9000755.570111: tcp_destroy_sock: sport=18234 dport=55838 saddr=xxx daddr=xxx saddrv6=::ffff:xxx daddrv6=::ffff:xxx sock_cookie=261805 <...>-19338 [018] .... 9000755.570118: tcp_v4_destroy_sock_p: (tcp_v4_destroy_sock+0x0/0x200) sock=0xffff88c09c8a5e80 <...>-19338 [018] .... 9000755.570118: tcp_destroy_sock: sport=18234 dport=55840 saddr=xxx daddr=xxx saddrv6=::ffff:xxx daddrv6=::ffff:xxx sock_cookie=261806 <...>-18309 [008] ..s1 9000755.940014: tcp_v4_send_reset_p: (tcp_v4_send_reset+0x0/0x6f0) sock=0x0

      If we restore two process at the same time, tcp connection alse be reset.   

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

CRIU full dump/restore logs:

criu restore.log has no error info xxx (00.231980) 2922 was stopped (00.232003) 2984 was trapped (00.232011) 2984 (native) is going to execute the syscall 11, required is 11 (00.232263) 2984 was stopped (00.232270) Running pre-resume scripts (00.232458) Writing stats (00.232520) Running post-resume scripts ``` (paste your output here) ```

Output of `criu --version`:

3.6 ``` (paste your output here) ```

Output of `criu check --all`:

``` (paste your output here) ```

Additional environment details:

adrianreber commented 6 months ago

Sorry, but it is not clear what you are trying to do and what your problem is.

Please do not create such long titles. Try to add your problem description to the ticket and not to the title.

Jiehon commented 6 months ago

sorry, this bug has been fixed。 commit id:dc384c0e30203092d278f4dfcd7b89729f48b44b

adrianreber commented 6 months ago

Ah, I see, you were using an ancient CRIU version. You should always use the newest release. Please close the ticket if it is resolved.