containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.44k stars 2.38k forks source link

CI: podman checkpoint container with --pre-checkpoint not working in container testing #24230

Open Luap99 opened 2 days ago

Luap99 commented 2 days ago

With the latest image update (https://github.com/containers/podman/pull/24227) checkpoint is broken inside the container test:

→ Enter [It] podman checkpoint container with --pre-checkpoint - /var/tmp/go/src/github.com[/containers/podman/test/e2e/checkpoint_test.go:969](https://github.com/containers/podman/blob/ee70c495901ce4865b8a61290700c027eabd7937/test/e2e/checkpoint_test.go#L969) @ 10/10/24 14:04:37.825
           # podman [options] run -d --network podman5 quay.io/libpod/alpine:latest top
           6d1f1d2b3d02e8d920b33038860e7bfdf077712b3f99389a1866be88393ab22c
           # podman [options] container checkpoint -P 6d1f1d2b3d02e8d920b33038860e7bfdf077712b3f99389a1866be88393ab22c
           *** buffer overflow detected ***: terminated
           CRIU feature checking failed -52.  Please check CRIU logfile /tmp/CI_Nlm2/podman-e2e-190218032/subtest-2996264589/root/overlay-containers/6d1f1d2b3d02e8d920b33038860e7bfdf077712b3f99389a1866be88393ab22c/userdata/dump.log
           Error: `/usr/bin/crun checkpoint --image-path /tmp/CI_Nlm2/podman-e2e-190218032/subtest-2996264589/root/overlay-containers/6d1f1d2b3d02e8d920b33038860e7bfdf077712b3f99389a1866be88393ab22c/userdata/pre-checkpoint --work-path /tmp/CI_Nlm2/podman-e2e-190218032/subtest-2996264589/root/overlay-containers/6d1f1d2b3d02e8d920b33038860e7bfdf077712b3f99389a1866be88393ab22c/userdata --pre-dump 6d1f1d2b3d02e8d920b33038860e7bfdf077712b3f99389a1866be88393ab22c` failed: exit status 1

           [FAILED] Command failed with exit status 125. See above for error message.

Both podman checkpoint container with --pre-checkpoint and podman checkpoint container with --pre-checkpoint and export (migration) fail the same way

https://api.cirrus-ci.com/v1/artifact/task/5294903477927936/html/int-podman-fedora-40-root-container-sqlite.log.html

I don't have time to look into this so I am just going to skip this just filing this so we can track it.

edsantiago commented 2 days ago

See https://github.com/containers/automation_images/pull/387#issuecomment-2404942252 , in particular, the criu 4.0 update:

debian prior-fedora fedora fedora-aws rawhide
criu 3.17.1-3 3.19-2 4.0-1 3.19-4 4.0-1
3.19-6 ⇑ 3.19-7 ⇑
Luap99 commented 1 day ago

Reproducer:

$ sudo bin/podman run --rm --privileged --net=host --cgroupns=host -v /var/lib/containers -v $(pwd):/repo -w /repo -v /tmp:/tmp -it quay.io/libpod/fedora_podman:c20241010t105554z-f40f39d13 bash

[root@pholzing-fedora repo]# bin/podman run -d quay.io/libpod/alpine:latest top
8a080765b0f5aed1138e6ffb0d6c1c04a48aee93cf96776ba7059b6e775e8be8
[root@pholzing-fedora repo]# bin/podman container checkpoint -P test
*** buffer overflow detected ***: terminated
2024-10-11T14:58:53.008984Z: CRIU feature checking failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/8a080765b0f5aed1138e6ffb0d6c1c04a48aee93cf96776ba7059b6e775e8be8/userdata/dump.log
Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/8a080765b0f5aed1138e6ffb0d6c1c04a48aee93cf96776ba7059b6e775e8be8/userdata/pre-checkpoint --work-path /var/lib/containers/storage/overlay-containers/8a080765b0f5aed1138e6ffb0d6c1c04a48aee93cf96776ba7059b6e775e8be8/userdata --pre-dump 8a080765b0f5aed1138e6ffb0d6c1c04a48aee93cf96776ba7059b6e775e8be8` failed: exit status 1

And the criu logfile was empty so nothing useful to see in there.

Trying to use a normal fedora image as base then install podman does not seem to reproduce and I tried both criu-3.19-4 and criu-4.0-1 so there must be some magic in our special test image.

@adrianreber @rst0git Any ideas what could cause *** buffer overflow detected ***: terminated?