Closed apinter closed 3 years ago
I cannot reproduce on Fedora 33 with runc nor crun.
@apinter could you retry with using crun
?
I cannot reproduce on Fedora 33 with runc nor crun.
@apinter could you retry with using
crun
?
I could, but it is only available from unofficial repos which I really don't feel like adding to my server. Is there anything I could check/fix with the current criu
/runc
installed?
Should I report a bug on the openSUSE side maybe?
I cannot reproduce on Fedora 33 with runc nor crun. @apinter could you retry with using
crun
?I could, but it is only available from unofficial repos which I really don't feel like adding to my server. Is there anything I could check/fix with the current
criu
/runc
installed?
Can you you rerun with podman --log-level=debug
? A look at journalctl
may reveal some more details. Trying crun
may help isolate the error source.
Should I report a bug on the openSUSE side maybe?
I suggest to wait a bit. In case we don't find a reproducer, it may be worth opening a bug there.
Can you you rerun with
podman --log-level=debug
? A look atjournalctl
may reveal some more details. Tryingcrun
may help isolate the error source.
I did, this is what happened:
# podman container checkpoint --log-level=debug --keep --leave-running --export=./test.tar.gz test
INFO[0000] podman filtering at log level debug
DEBU[0000] Called checkpoint.PersistentPreRunE(podman container checkpoint --log-level=debug --keep --leave-running --export=./test.tar.gz test)
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf"
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.29.0 Annotations:[] CgroupNS:host Cgroups:enabled DefaultCapabilities:[CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] DefaultSysctls:[] DefaultUlimits:[nproc=32768:32768] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableKeyring:true EnableLabeling:false Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm] EnvHost:false HTTPProxy:false Init:false InitPath:/usr/bin/catatonit IPCNS:private LogDriver:k8s-file LogSizeMax:-1 NetNS:bridge NoHosts:false PidsLimit:2048 PidNS:private SeccompProfile:/usr/share/containers/seccomp.json ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS:host UserNSSize:65536} Engine:{ImageBuildFormat:oci CgroupCheck:false CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath:/var/run/libpod/events/events.log EventsLogger:journald HooksDir:[/usr/share/containers/oci/hooks.d] ImageDefaultTransport:docker:// InfraCommand: InfraImage:k8s.gcr.io/pause:3.2 InitPath:/usr/libexec/podman/catatonit LockType:shm MultiImageArchive:false Namespace: NetworkCmdPath: NoPivotRoot:false NumLocks:2048 OCIRuntime:runc OCIRuntimes:map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] RuntimePath:[] RuntimeSupportsJSON:[crun runc] RuntimeSupportsNoCgroups:[crun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 StaticDir:/var/lib/containers/storage/libpod StopTimeout:10 TmpDir:/var/run/libpod VolumePath:/var/lib/containers/storage/volumes} Network:{CNIPluginDirs:[/usr/libexec/cni] DefaultNetwork:podman NetworkConfigDir:/etc/cni/net.d/}}
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver btrfs
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "btrfs"
DEBU[0000] Initializing event backend journald
DEBU[0000] using runtime "/usr/bin/runc"
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument
WARN[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
WARN[0000] Default CNI network name podman is unchangeable
INFO[0000] Setting parallel job count to 13
DEBU[0000] Trying to checkpoint container b9e7d06fc4249992f90c4a9822f3a89f54e78a375b12864ed71f600c6bbb528f
DEBU[0000] Writing checkpoint to /var/lib/containers/storage/btrfs-containers/b9e7d06fc4249992f90c4a9822f3a89f54e78a375b12864ed71f600c6bbb528f/userdata/checkpoint
DEBU[0000] Writing checkpoint logs to /var/lib/containers/storage/btrfs-containers/b9e7d06fc4249992f90c4a9822f3a89f54e78a375b12864ed71f600c6bbb528f/userdata
ERRO[0000] read unixpacket @->@: EOF
Error: `/usr/bin/runc checkpoint --image-path /var/lib/containers/storage/btrfs-containers/b9e7d06fc4249992f90c4a9822f3a89f54e78a375b12864ed71f600c6bbb528f/userdata/checkpoint --work-path /var/lib/containers/storage/btrfs-containers/b9e7d06fc4249992f90c4a9822f3a89f54e78a375b12864ed71f600c6bbb528f/userdata --leave-running b9e7d06fc4249992f90c4a9822f3a89f54e78a375b12864ed71f600c6bbb528f` failed: exit status 1
@adrianreber, do you know what may go wrong?
I guess this is because of btrfs.
https://criu.org/Filesystems_pecularities
Sorry, but btrfs is known to be not correctly supported.
Upstream CRIU we do not get many questions about btrfs, so that nobody looked into fixing it so far.
In bugzilla I would close this ticket with cantfix.
If you can please try it with another filesystem.
Oh, in the userdata
directory there should be a log file from CRIU called dump.log
. That would be interesting to see. But I am pretty sure this happens because of btrfs.
Had a feeling that btrfs
might be the cause of the issue. Unfortunately CRIU doesn't even create a dump.log
to get more information on the issue. Guess I could move the container into a pod
and just play it on another server and use btrfs send-receive
to send over the volume as well.
Considering F33 by default is on btrfs
too I'm wondering if it fails there as well.
I think only F33 Workstation is using btrfs. Only new installs and only Workstation. If you do an upgrade you will stay on the existing FS. But yes, I also expect more people to complain about it.
Oh, in the
userdata
directory there should be a log file from CRIU calleddump.log
. That would be interesting to see. But I am pretty sure this happens because of btrfs.
Done a little test with openSUSE installed on xfs
and yes, you're right. The problem is btrfs
:
# podman container checkpoint --keep --leave-running --export=./test.tar.gz test
ERRO[0000] container is not destroyed
34cf63cfb9643d79a88cb18b16d12fb82ffcb26c3d77b4ed0c3648a1350e0e70
# l
total 348
drwx------ 5 root root 87 Feb 17 11:32 ./
drwxr-xr-x 21 root root 244 Feb 17 11:32 ../
-rw------- 1 root root 466 Feb 17 11:29 .bash_history
drwx------ 2 root root 6 Mar 7 2020 .gnupg/
drwxr-xr-x 2 root root 6 Mar 7 2020 bin/
drwxr-xr-x 4 root root 48 Feb 17 10:56 inst-sys/
-rw------- 1 root root 351630 Feb 17 11:32 test.tar.gz
# podman container checkpoint --keep --export=./test2.tar.gz test
34cf63cfb9643d79a88cb18b16d12fb82ffcb26c3d77b4ed0c3648a1350e0e70
localhost:~ # l
total 692
drwx------ 5 root root 107 Feb 17 11:32 ./
drwxr-xr-x 21 root root 244 Feb 17 11:32 ../
-rw------- 1 root root 466 Feb 17 11:29 .bash_history
drwx------ 2 root root 6 Mar 7 2020 .gnupg/
drwxr-xr-x 2 root root 6 Mar 7 2020 bin/
drwxr-xr-x 4 root root 48 Feb 17 10:56 inst-sys/
-rw------- 1 root root 351630 Feb 17 11:32 test.tar.gz
-rw------- 1 root root 351814 Feb 17 11:32 test2.tar.gz
However, migrating a container from an xfs
to btrfs
system seemingly works as the container loaded, but the log says that it failed :/
$ sudo podman ps
[sudo] password for root:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$ sudo podman container restore -i test2.tar.gz
Trying to pull docker.io/library/nginx:latest...
Getting image source signatures
Copying blob 66e650438339 done
Copying blob 45b42c59be33 done
Copying blob 76a3dfe4406b done
Copying blob d0d9e9ea897e done
Copying blob 410ff9d97480 done
Copying config 298ec0e287 done
Writing manifest to image destination
Storing signatures
Error: OCI runtime error: criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/btrfs-containers/34cf63cfb9643d79a88cb18b16d12fb82ffcb26c3d77b4ed0c3648a1350e0e70/userdata/restore.log
$ sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
34cf63cfb964 docker.io/library/nginx:latest nginx -g daemon o... 34 seconds ago Created test
$ sudo podman start test
test
$ sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
34cf63cfb964 docker.io/library/nginx:latest nginx -g daemon o... 47 seconds ago Up 6 seconds ago test
Considering the length of the log I posted it here
Guess the next question is how to get it to work on btrfs? Wondering if setting noCOW
on the container store would be a work around for this or create a new subvolume and make it completely noCOW
.
The restore did not work. It seems you are hitting an error in the error path. The container should not be in the state created after a failed restore. Using nginx
does not really make much sense to migrate as it is stateless. Try something else. I am using the following test container for tests like this:
# podman run -d quay.io/adrianreber/counter
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 0
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 1
If you migrate this you can be sure it is a real checkpoint/restore, because the next answer should be 'counter: 2`.
To get it working on btrfs
you have to go to upstream CRIU and fix it there. There is nothing Podman can do to fix this.
# podman run -d quay.io/adrianreber/counter # curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088 counter: 0 # curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088 counter: 1
That is an awesome testing container, thank you. Anyhow, I did set noCOW
on /var/lib/container
and tried again on btrfs
.
Exported the container without error, deleted the runtime container and restored it from the export. Setting noCOW
solved the issue:
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 6
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 7
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 8
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 9
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 10
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 11
# podman container checkpoint --keep --export=./test11.tar.gz awesome_goldstine
b9027b522b432e854307ee96358b9fc8e02f24cbd1b2c4a5b53a868be4d7db90
# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b9027b522b43 quay.io/adrianreber/counter:latest 4 minutes ago Exited (0) 4 seconds ago awesome_goldstine
# podman rm awesome_goldstine
b9027b522b432e854307ee96358b9fc8e02f24cbd1b2c4a5b53a868be4d7db90
# l
total 6400
drwxr-xr-x 1 root root 578 Feb 17 15:51 ./
drwxr-xr-x 1 root root 114 Feb 17 15:40 ../
drwxr-xr-x 1 root root 0 May 16 2020 GeoIP/
drwx------ 1 root root 0 Jun 25 2020 NetworkManager/
drwxr-xr-x 1 root root 418 Feb 17 15:41 YaST2/
drwxr-xr-x 1 root root 394 Feb 17 15:34 alternatives/
drwxr-xr-x 1 root root 10 Feb 17 15:33 apparmor/
drwxr-xr-x 1 root root 16 Feb 17 15:35 autoinstall/
drwxr-xr-x 1 root root 70 Feb 17 15:31 ca-certificates/
drwxr-x--- 1 chrony chrony 0 Jun 10 2020 chrony/
drwx------ 1 root root 30 Feb 17 15:47 cni/
drwx------ 1 root root 24 Feb 17 15:46 containers/
drwxr-xr-x 1 root root 20 Feb 17 15:33 dbus/
drwxr-xr-x 1 root root 30 Feb 17 15:34 dhcp/
drwxr-xr-x 1 root root 32 Feb 17 15:34 dhcp6/
drwx------ 1 root root 8 Feb 17 15:40 ebtables/
drwxr-xr-x 1 root root 0 Mar 7 2020 empty/
drwxr-xr-x 1 root root 36 Feb 17 15:28 hardware/
drwxr-xr-x 1 root root 0 Sep 21 2019 lifecycle/
drwxr-xr-x 1 root root 38 Feb 17 15:37 misc/
drwxr-xr-x 1 root root 0 May 17 2020 net-snmp/
drwxr-xr-x 1 root root 56 Feb 17 15:34 nfs/
drwxr-xr-x 1 nobody root 0 Jun 9 2020 nobody/
drwxr-xr-x 1 root root 54 Feb 17 15:40 nscd/
drwxr-xr-x 1 root root 0 May 16 2020 os-prober/
drwxr-xr-x 1 root root 26 Feb 17 15:35 plymouth/
drwxr-xr-x 1 root root 0 May 17 2020 polkit/
drwx------ 1 postfix root 22 Feb 17 15:40 postfix/
lrwxrwxrwx 1 root root 26 Feb 17 15:35 rpm -> ../../usr/lib/sysimage/rpm/
drwxr-xr-x 1 root root 68 Feb 17 15:35 samba/
drwxr-xr-x 1 root root 22 Feb 17 15:33 smartmontools/
drwxr-xr-x 1 root root 0 Jun 9 2020 sshd/
drwx--x--x 1 root root 20 Feb 17 15:41 sudo/
drwxr-xr-x 1 root root 104 Feb 17 15:40 systemd/
-rw------- 1 root root 2181425 Feb 17 15:49 test.tar.gz
-rw------- 1 root root 2181134 Feb 17 15:51 test11.tar.gz
-rw------- 1 root root 2181397 Feb 17 15:50 test2.tar.gz
drwxr-xr-x 1 root root 0 May 17 2020 usb_modeswitch/
drwxr-xr-x 1 root root 0 Jun 6 2020 vmware/
drwxr-x--- 1 root root 80 Feb 17 15:50 wicked/
drwxr-xr-x 1 root root 136 Feb 17 15:43 zypp/
# podman container restore -i test11.tar.gz
b9027b522b432e854307ee96358b9fc8e02f24cbd1b2c4a5b53a868be4d7db90
# curl `podman inspect -l --format "{{.NetworkSettings.IPAddress}}"`:8088
counter: 12
Thanks for your help and guidance, appreciated!
Nice. Just once more, for me, setting noCOW
on the storage backend directory enables you to checkpoint and restore containers on btrfs
?
Yap, all it takes is chattr +C /var/lib/containers
and following that I see two options: either leave it with noCOW
or set it back to COW
with chattr -C /var/lib/containers
(although I did not test if setting it back to COW
would cause any issue, but it seems unlikely).
Until criu
is ready for btrfs
I think this is a good enough solution ^_^
Thanks for the details. I added it to CRIU's wiki: https://criu.org/Filesystems_pecularities#BTRFS_Workaround
If this works for you I would say you could close this issue.
Super! Thank you again for your help!
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug Description I'm trying to export a test container which is running nginx - named test - with
podman container checkpoint --keep --leave-running --export=/root/test.tar.gz test
but for some reason it fails.Steps to reproduce the issue:
podman run -d --name test nginx
podman container checkpoint --keep --leave-running --export=./test.tar.gz test
Describe the results you received: It fails with:
The container also stops. Describe the results you expected: Expected to create and export a container checkpoint.
Additional information you deem important (e.g. issue happens only occasionally): I'm running Podman on openSUSE MicroOS and Tumbleweed, both show the same error. Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.): Running on physical servers.