checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.92k stars 585 forks source link

Move struct libsoccr_sk to soccr.h #2010

Open georgemakrakis opened 1 year ago

georgemakrakis commented 1 year ago

Can we move the implementation of the libsoccr_sk struct from soccr.c to the soccr.h? Otherwise when trying to use the soccr.h it in other files we get the following error.

error: dereferencing pointer to incomplete type 'struct libsoccr_sk'

I have built the library in version criu-3.17.1 with this change but not all the tests pass using the test/zdtm.py run -a --keep-going .

################### 6 TEST(S) FAILED (TOTAL 438/SKIPPED 49) ####################
 * zdtm/static/socket-tcpbuf6-local(unknown)
 * zdtm/static/socket-tcpbuf6(unknown)
 * zdtm/static/socket-tcpbuf-local(unknown)
 * zdtm/static/apparmor_stacking(h)
 * zdtm/static/socket-tcpbuf(unknown)
 * zdtm/static/pthread_timers(ns)
##################################### FAIL #####################################

I can try to fix the problem with the tests and issue a PR if needed.

Output of `criu --version`:

``` Version: 3.17.1 ```

Output of `criu check --all`:

``` Error (criu/cr-check.c:1273): Time namespaces are not supported Error (criu/cr-check.c:1283): IFLA_NEW_IFINDEX isn't supported Warn (criu/cr-check.c:1305): Pidfd store requires pidfd_getfd syscall which is not supported Warn (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support Warn (criu/cr-check.c:804): ptrace(PTRACE_GET_RSEQ_CONFIGURATION) isn't supported. C/R of processes which are using rseq() won't work. Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure. ```

avagin commented 1 year ago

Could you add more details? soccr.h is used in a few places and criu is compiled without any issues.

avagin@avagin:~/git/criu$ git grep soccr.h criu/cr-check.c:#include "../soccr/soccr.h" criu/log.c:#include "../soccr/soccr.h" criu/net.c:#include "../soccr/soccr.h" criu/netfilter.c:#include "../soccr/soccr.h" criu/sk-inet.c:#include "../soccr/soccr.h" criu/sk-tcp.c:#include "../soccr/soccr.h" soccr/soccr.c:#include "soccr.h" soccr/test/tcp-conn.c:#include "../soccr.h" soccr/test/tcp-constructor.c:#include "soccr/soccr.h"

georgemakrakis commented 1 year ago

Apologies for the late response @avagin . The particular problem lies in the fact that when I try to access the members of the libsoccr_sk struct (e.g. fd, *src_addr) in a file in my project that uses soccr.h, I get the above error (dereferencing pointer). I think that this is because libsoccr_sk is implemented in soccr.c and not in soccr.h. When I do this modification and I recompile criu, I can access the members of the libsoccr_sk struct. As you mentioned, in all other places inside criu, it is used without any issues.

avagin commented 1 year ago

@georgemakrakis could you explain why you need to access these members? Probably, it is better to add helper functions to access them.

georgemakrakis commented 1 year ago

I need to such access to pass the file descriptor of a restored network socket to another process using Unix Domain Sockets. In an essence, I have 2 programs, one in C that I do all the operations regarding CRIU (checkpoint, restore) and one in Python that is running my webserver. The first one will pass the FD to the second one after the restore.

Helper functions will also work in my case.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

avagin commented 1 year ago

I think helper functions are better in this case. If we expose libsoccr_sk to public headers, we will need to think about its backward compatibility...

georgemakrakis commented 1 year ago

@avagin It makes sense. Should I move on with the creation of these helper functions?

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

georgemakrakis commented 1 year ago

Hello @avagin , I have not forgotten about the helper functions for the libsoccr_sk struct, just got busy with work. While I am trying to focus only on the libsoccr part, I noticed that I have to move the include directory inside the soccr one to run the make test command (otherwise the build of tcp-constructor.c fails). Furthermore, I get an error about the redeclaration of enumerator ‘TCP_NO_QUEUE’ and the rest of them that reside in tcp-conn.c. I can also comment these out from the tcp-conn.c file but I am not sure if this is the cleanest way to do it, even if it seems that it works for now. Do you have any other suggestions on how to make all these work in a cleaner way?

georgemakrakis commented 1 year ago

After I posted the above I realized that even with all these "hacky" changes, and the test saying PASS in the end, I see an Error (soccr/soccr.c:105): Failed to turn off repair mode on socket: Bad file descriptor. Here is the full log:

cc -Wall -g -I../../ tcp-constructor.c -o tcp-constructor -L../ -lsoccr ../libsoccr.a -lnet
cc -Wall -g -I../../ tcp-conn.c -o tcp-conn -L../ -lsoccr ../libsoccr.a -lnet
cc -Wall -g -I../../ -DTEST_IPV6 tcp-conn-v6.c -o tcp-conn-v6 -L../ -lsoccr ../libsoccr.a -lnet
unshare -n sh -c "ip link set up dev lo; ./tcp-conn"
Debug:  Setting 1 queue seq to 225178792
Debug:  Setting 2 queue seq to 1212215004
Debug:  Restoring TCP options
Debug:      Will turn SAK on
Debug:      Will set snd_wscale to 10
Debug:      Will set rcv_wscale to 10
Debug:      Will turn timestamps on
Debug: Will set mss clamp to 65495
Debug:  Restoring TCP 1 queue data 11 bytes
Error (soccr/soccr.c:105): Failed to turn off repair mode on socket: Bad file descriptor
unshare -n sh -c "ip link set up dev lo; ./tcp-conn-v6"
Debug:  Setting 1 queue seq to 3142614254
Debug:  Setting 2 queue seq to 2409802264
Debug:  Restoring TCP options
Debug:      Will turn SAK on
Debug:      Will set snd_wscale to 10
Debug:      Will set rcv_wscale to 10
Debug:      Will turn timestamps on
Debug: Will set mss clamp to 65476
Debug:  Restoring TCP 1 queue data 11 bytes
Error (soccr/soccr.c:105): Failed to turn off repair mode on socket: Bad file descriptor
python run.py ./tcp-constructor
./tcp-constructor
127.0.0.1:12345:555
127.0.0.1:54321:666
127.0.0.1:12345:555
127.0.0.1:54321:666
dst: send() -> 10
dst: recv() -> 7300
dst: recv() -> 84732
dst: recv() -> 1201580
dst: recv() -> 471580
dst: recv() -> 414640
dst: recv() -> 435080
dst: recv() -> 499320
dst: recv() -> 481800
dst: recv() -> 505160
dst: recv() -> 531440
dst: recv() -> 78840
dst: recv() -> 621960
dst: recv() -> 585460
dst: recv() -> 614660
dst: recv() -> 645320
dst: recv() -> 677440
src: send() -> 9437184
src: recv() -> 10
dst: recv() -> 711020
dst: recv() -> 747520
dst: recv() -> 122332
PASS

Apologies for the "storm" of questions/issues.