Closed edsantiago closed 2 years ago
@adrianreber @rst0git PTAL
@lsm5 FYI this is for the other two failures
@edsantiago this is a known issue (https://github.com/checkpoint-restore/criu/issues/1696). There is a pull request for CRIU https://github.com/checkpoint-restore/criu/pull/1706 and a workaround in https://github.com/checkpoint-restore/criu/commit/d99def7dcfa938918368c91021f72a77f738bc61
@rst0git thank you
@containers/podman-maintainers the abovementioned CRIU PR has been open for one month; CI is red, there is no indication of when it will merge and then when we'll get a new version. Should we disable checkpoint tests in rawhide for the next few months, so we can pass gating tests?
@edsantiago I opened a pull request for go-criu with the workaround mentioned above: https://github.com/checkpoint-restore/go-criu/pull/61
This should allow the tests in CI to pass.
@rst0git aha - the part that was not obvious to me is that checkpoint-restore/go-criu
is vendored in podman (currently v5.3.0). Presumably, if/when your PR merges, podman can bump go.mod
to vendor in a (tagged? untagged?) go-criu
. Leaving this here for benefit of anyone else unfamiliar with criu and its integration in podman. (If I am mistaken in anything, please correct me!) Thanks again.
Can you try to export GLIBC_TUNABLES=glibc.pthread.rseq=0
to see if this makes the error go away?
@lsm5 ^^ might be worth a try. The place to do it is the test yaml, adding a new environment
stanza, but the hard part is getting that in the right place and with the right indentation and all that yamly stuff. If you feel comfortable yamling, this might be an easy way to get tests passing (assuming it works). If you're like me, and would need to spend an hour moving the minuses and spaces, it might be more trouble than it's worth.
@adrianreber the suggested envariable makes no difference that I can see:
# GLIBC_TUNABLES=glibc.pthread.rseq=0 bats /usr/share/podman/test/system/*checkpoint.bats
✗ podman checkpoint - basic test
...
# podman container restore 55055c549b8ae0b6ecdc2f1b7c234dd239b9096f40bf3591c3b27464cb4080fa
Error: OCI runtime error: crun: CRIU restoring failed -52. Please check CRIU logfile /var/lib/containers/storage/overlay-containers/55055c549b8ae0b6ecdc2f1b7c234dd239b9096f40bf3591c3b27464cb4080fa/userdata/restore.log
✗ podman checkpoint --export, with volumes
# podman container restore --import=/tmp/podman_bats.NX0UiV/c_Nl0hlYaSfs.tar.gz
Error: OCI runtime error: crun: CRIU restoring failed -52. Please check CRIU logfile /var/lib/containers/storage/overlay-containers/7fc7b146f81bcd918e685207cd7ac310658a8f5f3a926bd0ec03a7d887917b4b/userdata/restore.log
I think we call the oci runtime with clear environment variables so GLIBC_TUNABLES will not be set for crun/runc
Concur. I think there is a way to set environment variables for Conmon in containers.conf
but I'm also not aware of us ever needing to use it in the last 2 years, so it might not even work.
(The field in containers.conf
is conmon_env_vars=[]
)
Now, whether Conmon will call the OCI runtime with its full environment, I don't know - I only know that we clear environment before starting Conmon.
@mheon thank you! With this /etc/containers/containers.conf
:
[engine]
conmon_env_vars = [ "GLIBC_TUNABLES=glibc.pthread.rseq=0" ]
...I get -70
instead of -52
:
# podman container restore 3bfa
Error: OCI runtime error: crun: CRIU restoring failed -70. Please check CRIU logfile /var/lib/containers/storage/overlay-containers/3bfa78a5d79fcec27e0c38a6e880375e85fcc2ee304f18ef45ca86d38c28068a/userdata/restore.log
This time, the pointed-to log file is empty (size zero) so there's nothing to attach. (And no, I'm not out of disk space).
The bug is almost certainly in the handling of conmon_env_vars
because:
GLIBC_NONTUNABLES
yields the same -70
error; andconmon_env_vars
line reverts back to -52
with a full logfile.We've updated the Fedora Rawhide package for CRIU with support for rseq: https://koji.fedoraproject.org/koji/buildinfo?buildID=1911510
Thank you. I've confirmed that criu-3.16.1-6.fc36 fixes the problem and is now in fc36 stable.
Reproducible from the very first try
criu-restore.log
podman-4.0.0-0.1.rc1.fc36.x86_64 criu-3.16.1-4.fc36.x86_64 5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36.x86_64