Closed rst0git closed 1 week ago
Please make sure to add a test
Cockpit tests failed for commit 731fe643f029d0205cb13c570411cbff3e9ee55c. @martinpitt, @jelly, @mvollmer please check.
LGTM
Ran ginkgo tests on my laptop using --runc
and see consistent failures of the form "expected 1 to equal 2",
https://github.com/containers/podman/blob/298f31ba6fc1efcf6630282c45bb80b3f95f2534/test/e2e/checkpoint_test.go#L1243
Is there some CI host on which crun restore --lsm-mount-context is implemented?
The --lsm-mount-context
option was implemented only in runc (https://github.com/opencontainers/runc/pull/3068). crun currently doesn't support this option. When crun is used, Podman would show the following error message:
$ sudo podman container restore --pod=test --import=/tmp/test.tar.gz
Error: runtime /usr/bin/crun does not support pod restore
Ran ginkgo tests on my laptop using --runc and see consistent failures of the form "expected 1 to equal 2",
When running the tests locally as shown below, they pass
$ git diff
diff --git a/test/e2e/checkpoint_test.go b/test/e2e/checkpoint_test.go
index 58198fe37..de571139a 100644
--- a/test/e2e/checkpoint_test.go
+++ b/test/e2e/checkpoint_test.go
@@ -1119,7 +1119,7 @@ var _ = Describe("Podman checkpoint", func() {
share := share // copy into local scope, for use inside function
index := index
- It(testName, func() {
+ FIt(testName, func() {
podName := "test_pod"
if err := criu.CheckForCriu(criu.PodCriuVersion); err != nil {
$ sudo make localintegration FOCUS_FILE=checkpoint_test.go OCI_RUNTIME=runc
...
Timeline >>
> Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:28:58.155
< Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:28:58.155 (0s)
> Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:28:58.156
integration timing results
Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net) 64.774286
Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts,pid) 64.943064
Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts,pid) 65.902908
Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts) 65.931323
Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts) 65.936874
Podman checkpoint podman checkpoint and restore container out of and into pod (uts,pid) 69.145405
< Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:29:00.734 (2.578s)
<< Timeline
------------------------------
Ran 6 of 2226 Specs in 127.597 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 2220 Skipped
@edsantiago Would you be able to replicate this problem manually? Do you see any error messages?
podman pod create --name=test
podman run -d --pod=test --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint -l --export=/tmp/test.tar.gz
podman rm -a
podman pod rm -a
podman pod create --name=test
podman container restore --pod=test --import=/tmp/test.tar.gz
Is there some CI host on which crun restore --lsm-mount-context is implemented?
The
--lsm-mount-context
option was implemented only in runc (opencontainers/runc#3068). crun currently doesn't support this option. When crun is used, Podman would show the following error message:$ sudo podman container restore --pod=test --import=/tmp/test.tar.gz Error: runtime /usr/bin/crun does not support pod restore
Note we dropped all runc testing from CI recently and only test with crun. Is there any chance we can add support to crun as this is what our default is in fedora and other distros? cc @giuseppe
@adrianreber is --lsm-mount-context
something we could also have in crun?
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: giuseppe, Luap99, rst0git, stano45
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Is there any chance we can add support to crun as this is what our default is in fedora and other distros?
Yes, we discussed this with @adrianreber yesterday. I will try to implement support for --lsm-mount-context
with crun and open a pull request.
I run the tests locally and they failed on main with the reported error and pass with this patch.
Can you share your test command? This is mine, and it fails 100%:
$ sudo ./ginkgo --runc "checkpoint and restore container out of and into pod"
(where ./ginkgo
is a simple homebrewed wrapper). I also tried
$ sudo env FOCUS="checkpoint and restore container out of and into pod" OCI_RUNTIME=runc make localintegration
All tests fail with "1 is not 2". The relevant error seems to be:
Running: podman [...] restore --pod XXX -i foo.tar.gz
Error: pod XXX does not share the SOMETHING namespace
runc-1.1.12-3.fc40.x86_64 criu-3.19-4.fc40.x86_64
(I know this is merged and this is runc so we probably don't care, but I'm annoying that way)
Attachment: ginkgo script
FOCUS_FILE
maybe? But yes, I also tried ./ginkgo --runc checkpoint_test.go
. Same failures.
It worries me a little that podman container restore
can barf with "Error blah blah namespace" while exiting 0.
@edsantiago Would you be able to run the tests in a different environment (e.g., VM)?
The following is the output of the tests on Fedora 39 with runc v1.1.12 and criu v3.19
$ sudo make localintegration FOCUS="checkpoint and restore container out of and into pod" OCI_RUNTIME=runc
...
Timeline >>
> Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:05.147
< Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:05.147 (0s)
> Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:05.148
integration timing results
Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts,pid) 67.081255
Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts,pid) 67.084638
Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net) 67.194082
Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts) 68.621705
Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts) 69.706094
Podman checkpoint podman checkpoint and restore container out of and into pod (uts,pid) 72.526000
< Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:07.555 (2.407s)
<< Timeline
------------------------------
Ran 6 of 2226 Specs in 108.399 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 2220 Skipped
Ginkgo ran 1 suite in 2m0.206898675s
Test Suite Passed
It looks like the restore command completes successfully, and restores a container with ID 01686bf7
.
Running: /root/go/podman/bin/podman --storage-opt overlay.imagestore=/tmp/podman-e2e-477465866/imagecachedir --root /tmp/podman-e2e-477465866/subtest-127715438/root --runroot /tmp/podman-e2e-477465866/subtest-127715438/runroot --runtime runc --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-477465866/subtest-127715438 --events-backend file --db-backend sqlite --storage-driver overlay container restore --pod test_pod -i /tmp/podman-e2e-477465866/subtest-127715438/checkpoint-01686bf717523235de4c1e1db90d10711deda62d4682f0fc0fdeeac2fbc32844.tar.gz
01686bf717523235de4c1e1db90d10711deda62d4682f0fc0fdeeac2fbc32844
We expect NumberOfContainersRunning()
to be 2 because we are supposed to have restored container + Pod infra container running.
Running: /root/go/podman/bin/podman --root /tmp/podman-e2e-477465866/subtest-127715438/root --runroot /tmp/podman-e2e-477465866/subtest-127715438/runroot --runtime runc --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-477465866/subtest-127715438 --events-backend file --db-backend sqlite --storage-driver overlay ps -q
60873afce780
[FAILED] Expected
<int>: 1
to equal
<int>: 2
In [It] at: /root/go/podman/test/e2e/checkpoint_test.go:1231 @ 06/21/24 09:29:55.276
@edsantiago Would you be able to check what containers are running, and what is the state of the restored container?
My apologies: "Error blah blah namespace" is a red herring; part of the test.
Here's a reproducer on f40:
# bin/podman --runtime runc pod create --name test_pod --share uts,pid
e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f
# bin/podman --runtime runc run -d --rm --pod e1752 quay.io/libpod/testimage:20240123 top
Trying to pull quay.io/libpod/testimage:20240123...
...
f6802669af74233b115780fe9979702ee76a4adf86c87072f4d771929c082d04
# bin/podman --runtime runc container checkpoint -e /tmp/foo.tar.gz f6802
WARN[0033] freezer not supported: openat2 /sys/fs/cgroup/machine.slice/machine-libpod_pod_e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f.slice
/libpod-f6802669af74233b115780fe9979702ee76a4adf86c87072f4d771929c082d04.scope/cgroup.freeze: no such file or directory
WARN[0033] lstat /sys/fs/cgroup/machine.slice/machine-libpod_pod_e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f.slice/libpod-f6802669af74233b1
15780fe9979702ee76a4adf86c87072f4d771929c082d04.scope: no such file or directory
f6802
# bin/podman --runtime runc ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2225b8721eff localhost/podman-pause:5.2.0-dev-1718984608 2 minutes ago Up About a minute e1752af0a716-infra
# bin/podman --runtime runc pod rm e1752
e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f
# bin/podman --runtime runc pod create --name test_pod --share uts,pid
9e9a447c71a4dfb79da206d4fef0476f54a193129ac82ae2b605f352504ef5c4
# bin/podman --runtime runc container restore --pod 9e9a -i /tmp/foo.tar.gz
f6802669af74233b115780fe9979702ee76a4adf86c87072f4d771929c082d04
# bin/podman --runtime runc ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
544fb7f47829 localhost/podman-pause:5.2.0-dev-1718984608 36 seconds ago Up 12 seconds 9e9a447c71a4-infra
Repeating, and removing the --rm
so I can run logs, I get:
# bin/podman --runtime runc logs e461 | cat -vET
...
^Mtop: can't open '/proc': Permission denied
Currently, when Podman restores a container into a Pod, it always fails with the following error:
Steps to reproduce this error:
This error occurs because
r.state.Pod()
is called insetupContainer()
with the Pod name instead of ID. This problem is fixed by settingctrConfig.Pod
topod.ID()
.Does this PR introduce a user-facing change?
None