checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.77k stars 561 forks source link

Refactor criu plugin baseline v1 #2295

Open rerrabolu opened 8 months ago

rerrabolu commented 8 months ago
rerrabolu commented 8 months ago

Could someone advise me as to what is blocking? The 3 tests that are failing are doing so in "build" phase. It is not clear how that is connected to my change. What are the next steps

Snorch commented 8 months ago
# 1  DCO
Commit sha: [ff42a4f](https://github.com/checkpoint-restore/criu/pull/2295/commits/ff42a4f97f41849c45f5331dfa3c1b5840baa028), Author: Ramesh Errabolu, Committer: GitHub; The sign-off is missing.

# 2 CentOS Stream 8 based test
------------------------ grep Error ------------------------
b'(00.011876)     53: cg: setting cgns prefix to /system.slice/google-startup-scripts.service'
b'(00.011883)     53: cg: setting cgns prefix to /system.slice/google-startup-scripts.service'
b'(00.011890)     53: cg: setting cgns prefix to /system.slice/google-startup-scripts.service'
b'(00.011898)     53: cg: setting cgns prefix to /test'
b"(00.011905)     53: Error (criu/cgroup.c:1135): cg: Can't move 53 into zdtmtst//test/cgroup.procs (-1/-1): No such file or directory"
b"(00.011907)     53: Error (criu/cgroup.c:1191): cg: couldn't set cgns prefix zdtmtst//test/cgroup.procs: No such file or directory"
b'(00.011909)     53: Error (criu/cgroup.c:1282): cg: failed preparing cgns'
b'(00.012183) Error (criu/cr-restore.c:1513): 53 exited, status=1'
b'(00.012190) Error (criu/cr-restore.c:2557): Restoring FAILED.'
b'(00.016876) Error (criu/cgroup.c:1970): cg: cgroupd: recv req error: No such file or directory'
------------------------ ERROR OVER ------------------------
################ Test zdtm/static/cgroupns FAIL at CRIU restore ################

# 3 CentOS Stream 9 based test
gcc -g -Wall -Werror -D _GNU_SOURCE -shared -nostartfiles -fPIC -DCR_PLUGIN_DEFAULT="/usr/local/lib/criu" -I ../../compel/include/uapi amdgpu_plugin.c amdgpu_plugin_drm.c amdgpu_plugin_topology.c amdgpu_plugin_util.c criu-amdgpu.pb-c.c -o amdgpu_plugin.so -iquote../../include -iquote../../criu/include -iquote../../criu/arch/x86/include/ -iquote../../ -lpthread -lrt -ldrm -ldrm_amdgpu -I/usr/include/libdrm
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:52: multiple definition of `fd_next'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:69: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:54: multiple definition of `kfd_fw_version_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:50: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:55: multiple definition of `kfd_sdma_fw_version_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:52: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:56: multiple definition of `kfd_caches_count_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:54: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:57: multiple definition of `kfd_num_gws_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:56: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:58: multiple definition of `kfd_vram_size_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:58: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:59: multiple definition of `kfd_numa_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:60: first defined here
/usr/bin/ld: /tmp/ccrGAnOx.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_util.c:60: multiple definition of `kfd_capability_check'; /tmp/cczVlqFR.o:/tmp/criu/plugins/amdgpu/amdgpu_plugin_topology.c:62: first defined here
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile:32: amdgpu_plugin.so] Error 1
make[2]: *** [Makefile:347: amdgpu_plugin] Error 2
make[2]: Leaving directory '/tmp/criu'
make[1]: *** [Makefile:4: run] Error 2
make[1]: Leaving directory '/tmp/criu/test/others/make'
+ cleanup_cgroup
+ ./test/zdtm_umount_cgroups 12456
make: *** [Makefile:2: local] Error 2
make: Leaving directory '/tmp/criu/scripts/ci'

# 4 Vagrant Fedora Rawhide based test 
------------------------ grep Error ------------------------
b'(00.031043)     59: net: \tRunning ip rule delete table local'
b'(00.034514)     59: net: \tRunning ip rule restore'
b'(00.051113)     59: net: \tRunning iptables-restore -w for iptables-restore -w'
b'(00.055342)     59: net: \tRunning ip6tables-restore -w for ip6tables-restore -w'
b'(00.060395)     59: Error (criu/libnetlink.c:54): -16 reported by netlink: Device or resource busy'
b"(00.060445)     59: Error (criu/util.c:1495): Can't wait or bad status: errno=0, status=65280"
b'(00.060792) Error (criu/cr-restore.c:2557): Restoring FAILED.'
------------------------ ERROR OVER ------------------------
######### Test zdtm/static/socket-tcp-nfconntrack FAIL at CRIU restore #########

You can just enter details section of each check and see errors by yourself. 1 and 3 are obviously introduced by your code, so you should fix them. 4 is definitely unrelated to your code as I saw it in other PRs, 2 is likely unrelated - I triggered a rerun for it. I also approved all checks for your PR, so you may expect some more fails.

codecov-commenter commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (c474816) 70.55% compared to head (3ad74b3) 70.62%.

:exclamation: Current head 3ad74b3 differs from pull request most recent head ff42a4f. Consider uploading reports for the commit ff42a4f to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## criu-dev #2295 +/- ## ============================================ + Coverage 70.55% 70.62% +0.06% ============================================ Files 132 133 +1 Lines 33508 33312 -196 ============================================ - Hits 23642 23525 -117 + Misses 9866 9787 -79 ``` [see 21 files with indirect coverage changes](https://app.codecov.io/gh/checkpoint-restore/criu/pull/2295/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=checkpoint-restore)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 7 months ago

A friendly reminder that this PR had no activity for 30 days.

dayatsin-amd commented 7 months ago

Overall, the general idea/concept of this refactor is fine.

github-actions[bot] commented 6 months ago

A friendly reminder that this PR had no activity for 30 days.