containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
22.37k stars 2.31k forks source link

restore: fix container restore into pod #23056

Closed rst0git closed 1 week ago

rst0git commented 1 week ago

Currently, when Podman restores a container into a Pod, it always fails with the following error:

Error: cannot add container f96670b26e53e70f7f451191ea39a093c940c6c48b47218aeeef1396cb860042 to pod h2-pod: no such pod

Steps to reproduce this error:

podman pod create --name=test
podman run -d --pod=test --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint -l --export=/tmp/test.tar.gz
podman rm -a
podman pod rm -a

podman pod create --name=test
podman container restore --pod=test --import=/tmp/test.tar.gz

This error occurs because r.state.Pod() is called in setupContainer() with the Pod name instead of ID. This problem is fixed by setting ctrConfig.Pod to pod.ID().

Does this PR introduce a user-facing change?

None

Fixed container restore into Pod.
Luap99 commented 1 week ago

Please make sure to add a test

packit-as-a-service[bot] commented 1 week ago

Cockpit tests failed for commit 731fe643f029d0205cb13c570411cbff3e9ee55c. @martinpitt, @jelly, @mvollmer please check.

mheon commented 1 week ago

LGTM

edsantiago commented 1 week ago

Ran ginkgo tests on my laptop using --runc and see consistent failures of the form "expected 1 to equal 2",
https://github.com/containers/podman/blob/298f31ba6fc1efcf6630282c45bb80b3f95f2534/test/e2e/checkpoint_test.go#L1243

rst0git commented 1 week ago

Is there some CI host on which crun restore --lsm-mount-context is implemented?

The --lsm-mount-context option was implemented only in runc (https://github.com/opencontainers/runc/pull/3068). crun currently doesn't support this option. When crun is used, Podman would show the following error message:

$ sudo podman container restore --pod=test --import=/tmp/test.tar.gz
Error: runtime /usr/bin/crun does not support pod restore
rst0git commented 1 week ago

Ran ginkgo tests on my laptop using --runc and see consistent failures of the form "expected 1 to equal 2",

When running the tests locally as shown below, they pass

$ git diff
diff --git a/test/e2e/checkpoint_test.go b/test/e2e/checkpoint_test.go
index 58198fe37..de571139a 100644
--- a/test/e2e/checkpoint_test.go
+++ b/test/e2e/checkpoint_test.go
@@ -1119,7 +1119,7 @@ var _ = Describe("Podman checkpoint", func() {
                share := share // copy into local scope, for use inside function
                index := index

-               It(testName, func() {
+               FIt(testName, func() {
                        podName := "test_pod"

                        if err := criu.CheckForCriu(criu.PodCriuVersion); err != nil {

$ sudo make localintegration FOCUS_FILE=checkpoint_test.go OCI_RUNTIME=runc
...
  Timeline >>
  > Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:28:58.155
  < Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:28:58.155 (0s)
  > Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:28:58.156
  integration timing results
  Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net)       64.774286
  Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts,pid)       64.943064
  Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts,pid)       65.902908
  Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts)       65.931323
  Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts)       65.936874
  Podman checkpoint podman checkpoint and restore container out of and into pod (uts,pid)       69.145405
  < Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 06:29:00.734 (2.578s)
  << Timeline
------------------------------

Ran 6 of 2226 Specs in 127.597 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 2220 Skipped

@edsantiago Would you be able to replicate this problem manually? Do you see any error messages?

podman pod create --name=test
podman run -d --pod=test --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint -l --export=/tmp/test.tar.gz
podman rm -a
podman pod rm -a

podman pod create --name=test
podman container restore --pod=test --import=/tmp/test.tar.gz
Luap99 commented 1 week ago

Is there some CI host on which crun restore --lsm-mount-context is implemented?

The --lsm-mount-context option was implemented only in runc (opencontainers/runc#3068). crun currently doesn't support this option. When crun is used, Podman would show the following error message:

$ sudo podman container restore --pod=test --import=/tmp/test.tar.gz
Error: runtime /usr/bin/crun does not support pod restore

Note we dropped all runc testing from CI recently and only test with crun. Is there any chance we can add support to crun as this is what our default is in fedora and other distros? cc @giuseppe

giuseppe commented 1 week ago

@adrianreber is --lsm-mount-context something we could also have in crun?

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99, rst0git, stano45

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/containers/podman/blob/main/OWNERS)~~ [Luap99,giuseppe] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
rst0git commented 1 week ago

Is there any chance we can add support to crun as this is what our default is in fedora and other distros?

Yes, we discussed this with @adrianreber yesterday. I will try to implement support for --lsm-mount-context with crun and open a pull request.

edsantiago commented 1 week ago

I run the tests locally and they failed on main with the reported error and pass with this patch.

Can you share your test command? This is mine, and it fails 100%:

$ sudo ./ginkgo --runc "checkpoint and restore container out of and into pod" 

(where ./ginkgo is a simple homebrewed wrapper). I also tried

$ sudo env FOCUS="checkpoint and restore container out of and into pod" OCI_RUNTIME=runc make localintegration

All tests fail with "1 is not 2". The relevant error seems to be:

Running: podman [...] restore --pod XXX -i foo.tar.gz                                       
  Error: pod XXX does not share the SOMETHING namespace

runc-1.1.12-3.fc40.x86_64 criu-3.19-4.fc40.x86_64

(I know this is merged and this is runc so we probably don't care, but I'm annoying that way)

Attachment: ginkgo script

edsantiago commented 1 week ago

FOCUS_FILE maybe? But yes, I also tried ./ginkgo --runc checkpoint_test.go. Same failures.

It worries me a little that podman container restore can barf with "Error blah blah namespace" while exiting 0.

rst0git commented 1 week ago

@edsantiago Would you be able to run the tests in a different environment (e.g., VM)?

The following is the output of the tests on Fedora 39 with runc v1.1.12 and criu v3.19

$ sudo make localintegration FOCUS="checkpoint and restore container out of and into pod" OCI_RUNTIME=runc
...
  Timeline >>
  > Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:05.147
  < Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:05.147 (0s)
  > Enter [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:05.148
  integration timing results
  Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts,pid)       67.081255
  Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts,pid)       67.084638
  Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net)       67.194082
  Podman checkpoint podman checkpoint and restore container out of and into pod (ipc,net,uts)       68.621705
  Podman checkpoint podman checkpoint and restore container out of and into pod (net,uts)       69.706094
  Podman checkpoint podman checkpoint and restore container out of and into pod (uts,pid)       72.526000
  < Exit [SynchronizedAfterSuite] TOP-LEVEL - /home/radostin/go/src/github.com/containers/podman/test/e2e/common_test.go:200 @ 06/21/24 13:48:07.555 (2.407s)
  << Timeline
------------------------------

Ran 6 of 2226 Specs in 108.399 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 2220 Skipped

Ginkgo ran 1 suite in 2m0.206898675s
Test Suite Passed
edsantiago commented 1 week ago

Fails on f39 and f40 (fresh 1minutetip VMs).

rst0git commented 1 week ago

Fails on f39 and f40 (fresh 1minutetip VMs).

It looks like the restore command completes successfully, and restores a container with ID 01686bf7.

  Running: /root/go/podman/bin/podman --storage-opt overlay.imagestore=/tmp/podman-e2e-477465866/imagecachedir --root /tmp/podman-e2e-477465866/subtest-127715438/root --runroot /tmp/podman-e2e-477465866/subtest-127715438/runroot --runtime runc --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-477465866/subtest-127715438 --events-backend file --db-backend sqlite --storage-driver overlay container restore --pod test_pod -i /tmp/podman-e2e-477465866/subtest-127715438/checkpoint-01686bf717523235de4c1e1db90d10711deda62d4682f0fc0fdeeac2fbc32844.tar.gz
  01686bf717523235de4c1e1db90d10711deda62d4682f0fc0fdeeac2fbc32844

We expect NumberOfContainersRunning() to be 2 because we are supposed to have restored container + Pod infra container running.

  Running: /root/go/podman/bin/podman --root /tmp/podman-e2e-477465866/subtest-127715438/root --runroot /tmp/podman-e2e-477465866/subtest-127715438/runroot --runtime runc --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-477465866/subtest-127715438 --events-backend file --db-backend sqlite --storage-driver overlay ps -q
  60873afce780
  [FAILED] Expected
      <int>: 1
  to equal
      <int>: 2
  In [It] at: /root/go/podman/test/e2e/checkpoint_test.go:1231 @ 06/21/24 09:29:55.276

@edsantiago Would you be able to check what containers are running, and what is the state of the restored container?

edsantiago commented 1 week ago

My apologies: "Error blah blah namespace" is a red herring; part of the test.

Here's a reproducer on f40:

# bin/podman --runtime runc pod create --name test_pod --share uts,pid                                                             
e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f                                                                                                
# bin/podman --runtime runc run -d --rm --pod e1752 quay.io/libpod/testimage:20240123 top                                          
Trying to pull quay.io/libpod/testimage:20240123...                                                                                                             
...
f6802669af74233b115780fe9979702ee76a4adf86c87072f4d771929c082d04                                                                                                
# bin/podman --runtime runc container checkpoint -e /tmp/foo.tar.gz f6802                                                          
WARN[0033] freezer not supported: openat2 /sys/fs/cgroup/machine.slice/machine-libpod_pod_e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f.slice
/libpod-f6802669af74233b115780fe9979702ee76a4adf86c87072f4d771929c082d04.scope/cgroup.freeze: no such file or directory                                         
WARN[0033] lstat /sys/fs/cgroup/machine.slice/machine-libpod_pod_e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f.slice/libpod-f6802669af74233b1
15780fe9979702ee76a4adf86c87072f4d771929c082d04.scope: no such file or directory                                                                                
f6802                                                                                                                                                           

# bin/podman --runtime runc ps                                                                                                     
CONTAINER ID  IMAGE                                        COMMAND     CREATED        STATUS             PORTS       NAMES                                      
2225b8721eff  localhost/podman-pause:5.2.0-dev-1718984608              2 minutes ago  Up About a minute              e1752af0a716-infra                         

# bin/podman --runtime runc pod rm e1752                                                                                           
e1752af0a7161c998e350dab58e781f8ba5dc69c83b531309b46e31cb293796f                                                                                                
# bin/podman --runtime runc pod create --name test_pod --share uts,pid                                                             
9e9a447c71a4dfb79da206d4fef0476f54a193129ac82ae2b605f352504ef5c4                                                                                                
# bin/podman --runtime runc container restore --pod 9e9a -i /tmp/foo.tar.gz                                                        
f6802669af74233b115780fe9979702ee76a4adf86c87072f4d771929c082d04                                                                                                

# bin/podman --runtime runc ps                                                                                                     
CONTAINER ID  IMAGE                                        COMMAND     CREATED         STATUS         PORTS       NAMES                                         
544fb7f47829  localhost/podman-pause:5.2.0-dev-1718984608              36 seconds ago  Up 12 seconds              9e9a447c71a4-infra                            

Repeating, and removing the --rm so I can run logs, I get:

# bin/podman --runtime runc logs e461 | cat -vET
...
^Mtop: can't open '/proc': Permission denied