checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.9k stars 583 forks source link

criu: fix a fatal failure if nft doesn't work #2403

Closed kolyshkin closed 4 months ago

kolyshkin commented 4 months ago

On some systems, nft binary might not be installed, or some kernel options might be unconfigured, resulting in something like this:

sudo unshare -n nft create table inet CRIU
Error: Could not process rule: Operation not supported
create table inet CRIU
^^^^^^^^^^^^^^^^^^^^^^^

This is similar to what kerndat_has_nftables_concat() does, and if the outcome is the same, it returns an error to kerndat_init(), and an error from kerndat_init() is considered fatal.

Let's relax the check, returning mere "feature not working" instead of a fatal error.


This was discovered while running criu CI on ARM via actuated ci env generously provided by @alexellis. Currently it runs kernel 6.1.90 with the following config:

grep CONFIG_NF_TABLES /boot/config-6.1.90
CONFIG_NF_TABLES=m
# CONFIG_NF_TABLES_INET is not set
# CONFIG_NF_TABLES_NETDEV is not set
# CONFIG_NF_TABLES_IPV4 is not set
# CONFIG_NF_TABLES_ARP is not set
# CONFIG_NF_TABLES_IPV6 is not set
CONFIG_NF_TABLES_BRIDGE=m

I guess that missing CONFIG_NF_TABLES_INET is the source of the issue.

Adding this patch on top of current criu-dev fixes all failures of runc c/r tests (see e.g. https://github.com/opencontainers/runc/actions/runs/9024939764/job/24799738301).

How it failed before the fix

Failed run (using criu_3.19-2_arm64.deb from https://download.opensuse.org/repositories/devel:/tools:/criu/xUbuntu_22.04) looks like this (from https://github.com/opencontainers/runc/actions/runs/9023994162/job/24796971230):

=== RUN   TestUsernsCheckpoint
    checkpoint_test.go:45: Test requires userns
--- SKIP: TestUsernsCheckpoint (0.18s)
=== RUN   TestCheckpoint
time="2024-05-09T21:51:52Z" level=warning msg="--- Quoting \"/tmp/TestCheckpoint2075305918/003/criu/dump.log\""
time="2024-05-09T21:51:52Z" level=warning msg="17:(00.005350) net: Restoring netdev veth idx 10"
time="2024-05-09T21:51:52Z" level=warning msg="18:(00.006183) net: Dumping netns links"
time="2024-05-09T21:51:52Z" level=warning msg="19:(00.006241) net: \tLD: Got link 1, type 772"
time="2024-05-09T21:51:52Z" level=warning msg="20:(00.006249) net: \tLD: Got link 2, type 776"
time="2024-05-09T21:51:52Z" level=warning msg="21:(00.006256) net: \tLD: Got link 10, type 1"
time="2024-05-09T21:51:52Z" level=warning msg="22:(00.028176) Error (criu/kerndat.c:1563): Can't create nftables table"
time="2024-05-09T21:51:52Z" level=warning msg="23:(00.028262) Error (criu/util.c:1495): Can't wait or bad status: errno=0, status=256"
time="2024-05-09T21:51:52Z" level=warning msg="24:(00.028304) Error (criu/kerndat.c:1718): kerndat_has_nftables_concat failed when initializing kerndat."
time="2024-05-09T21:51:52Z" level=warning msg=---
    checkpoint_test.go:117: criu failed: type DUMP errno 0
--- FAIL: TestCheckpoint (0.25s)

(failures in runc integration tests are similar).

kolyshkin commented 4 months ago

Cc @adrianreber @ZeyadYasser @avagin

avagin commented 4 months ago

Applied. Thanks!

rst0git commented 4 months ago

LGTM. Closing as the patch has been applied to the criu-dev branch.

Snorch commented 4 months ago

The patch looks good.

But it seems that kdat.has_nftables_concat is unused after set, do we really need this code? @avagin @ZeyadYasser maybe you know more?

(I mean the only consequence of it set is Nftables based locking requires libnftables and set concatenations support warning)