containers / conmon

An OCI container runtime monitor.
Apache License 2.0
420 stars 127 forks source link

podman-remote: rootless: exec /no/such/command makes container become unusable #290

Closed edsantiago closed 3 years ago

edsantiago commented 3 years ago

I have a feeling this is conmon or crun or something non-podman.

Setup: with podman system service running as rootless:

$ podman-remote run --name foo -d quay.io/libpod/testimage:20210223 sh -c 'touch /foo;while [ -f /foo ]; do sleep 1;done'
51f48c694e4a2a8b580b6cb3ccef93ad47c1c79ef84722fde11fa7cfae2f5091
$ podman-remote exec foo /no/such/command
Error: executable file `/no/such/command` not found in $PATH: No such file or directory: OCI not found
ERRO[0592] error attaching to container 51f48c694e4a2a8b580b6cb3ccef93ad47c1c79ef84722fde11fa7cfae2f5091 exec session a47b26da81dce1122495741a3b4188230b908d64afb068be590c7eae20bc3e7c: executable file `/no/such/command` not found in $PATH: No such file or directory: OCI not found
$ podman-remote exec foo true

So far, so good. But now, trying to make the container exit:

$ podman-remote exec foo rm /foo
$ podman-remote ps
CONTAINER ID  IMAGE                              COMMAND               CREATED             STATUS                 PORTS   NAMES
51f48c694e4a  quay.io/libpod/testimage:20210223  sh -c touch /foo;...  About a minute ago  Up About a minute ago          foo

This should not happen. The container should be in Exited(0) state. Furthermore:

$ podman-remote exec foo true
Error: fail startup: OCI runtime error
ERRO[0609] error attaching to container 51f48c694e4a2a8b580b6cb3ccef93ad47c1c79ef84722fde11fa7cfae2f5091 exec session d9c365a6d44d619031401f02e4115436edd72356f3a25bfd1ddea17092c33231: fail startup: OCI runtime error

100% reproducible. Rootless only.

5.11.9-200.fc33 podman-3.1.0-3.fc33 crun-0.19-1.fc33 conmon-2.0.27-2.fc33 containers-common-1-10.fc33

edsantiago commented 3 years ago

I can't reproduce on my laptop with source-built podman @ 6933d4611a94097681a1d8435290d9bb1c59f1f4, 5.11.10-200.fc33, crun-0.19-1.fc33, conmon-2.0.27-2.fc33, containers-common-1-9.fc33

lsm5 commented 3 years ago

@mheon @baude @vrothberg @rhatdan PTAL and ack please? @edsantiago refuses to +1 f33 bodhi until this is acked :) https://bodhi.fedoraproject.org/updates/FEDORA-2021-e70b450680

mheon commented 3 years ago

Is Conmon (the original one, for the container, not the exec sessions) going down? Potentially a segfault or something similar?

edsantiago commented 3 years ago

I can no longer reproduce the problem. Brand new 1minutetip VM, exactly the same setup, I even tried (from memory) running the same commands I had in the previous VM. No luck reproducing, which I suppose is good luck in the bigger picture. Sweeping this under the rug.

edsantiago commented 3 years ago

It's back, with podman-3.2.0-1.fc33.

Also crun-0.19.1-3.fc33, containers-common-1-16.fc33. All else the same as OP.

mheon commented 3 years ago

From extensive investigation:

This appears to present very similarly to containers/podman#6573 - there are zombies in the container preventing it from exiting. After 300 seconds, the failed exec session is cleaned up, the zombie disappears, the container exits. Before this happens, the container's PID 1 is alive and in interruptible sleep. Conmon is still alive and monitoring container PID1 (since it has not exited). However, we can't do another podman exec into the container, because crun thinks it is dead. I'm not sure why.

Conmon definitely has the fix for containers/podman#6573 and it seems to be working. I am not sure what's going on here - it doesn't seem to make any sense.

This only reproduces on 1 VM and makes absolutely no sense, so I'm abandoning for now. If it starts reproducing more regularly we can look further.

edsantiago commented 3 years ago

strace -p container-pid-1 shows an infinite-loop:

wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 83
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=83, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 83
wait4(-1, 0x7ffda7798f0c, WNOHANG, NULL) = -1 ECHILD (No child processes)
stat("/foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
fork()                                  = 84
...

This surprised me at first -- I was not expecting fork()s -- but it's possible that test is not a builtin in the test image's sh (alpine). This would explain the SIGCHLD stream you saw.

I paid close attention to the strace output when running the exec /no/such/command. Nothing different (which is what I would expect; it would be weird if the sh process noticed or cared in any way). As soon as I ran exec rm /foo, the strace output stopped. Again, exactly as expected:

wait4(-1, 0x7ffda7798f0c, WNOHANG, NULL) = -1 ECHILD (No child processes)
stat("/foo", 0x7ffda7798cf0)            = -1 ENOENT (No such file or directory)
exit_group(0)                           = ?

The not-surprising-in-retrospect thing is that strace then hung, indicating that the process was still alive. Then, as I was typing this up, it finally ended:

+++ exited with 0 +++

Presumably the 300-second conmon timeout, although I didn't time it.

edsantiago commented 3 years ago

Here is a before-and-after ps auxww --forest. This is with the container just started:

fedora    341075  0.2  2.7 1280780 51028 pts/0   Sl   16:40   0:00  \_ podman system service --timeout=0
fedora    341081  0.6  3.1 1356176 58180 pts/0   Sl   16:40   0:00  |   \_ podman system service --timeout=0
fedora    341129  0.0  0.1   4788  2812 pts/0    S    16:40   0:00  |       \_ /usr/bin/slirp4netns --disable-host-loopback --mtu=65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-8b6ab76c-554d-a4c0-136b-843dac422f8f tap0
...
fedora    341126  0.0  0.0   4288  1704 ?        Ss   16:40   0:00 /usr/bin/fuse-overlayfs -o ,lowerdir=/home/fedora/.local/share/containers/storage/overlay/l/6RQUMSMNYFSNLMOIPL7Q7MOI4B,upperdir=/home/fedora/.local/share/containers/storage/overlay/d8753d5f6decb1d130673b75c95f004f0a98eaee7c238d6adfa50834e58acd99/diff,workdir=/home/fedora/.local/share/containers/storage/overlay/d8753d5f6decb1d130673b75c95f004f0a98eaee7c238d6adfa50834e58acd99/work,context="system_u:object_r:container_file_t:s0:c248,c268" /home/fedora/.local/share/containers/storage/overlay/d8753d5f6decb1d130673b75c95f004f0a98eaee7c238d6adfa50834e58acd99/merged
fedora    341132  0.0  0.1  81428  2320 ?        Ssl  16:40   0:00 /usr/bin/conmon --api-version 1 -c 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 -u 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 -r /usr/bin/crun -b /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata -p /run/user/1000/containers/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/pidfile -n foo --exit-dir /run/user/1000/libpod/tmp/exits --full-attach -s -l k8s-file:/home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/ctr.log --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/oci-log --conmon-pidfile /run/user/1000/containers/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/fedora/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588
fedora    341135  0.0  0.0   1580   900 ?        Ss   16:40   0:00  \_ sh -c touch /foo;while [ -f /foo ]; do sleep 1;done
fedora    341147  0.0  0.0   1576     4 ?        S    16:40   0:00      \_ sleep 1

Running podman-remote exec foo true adds a couple of conmon processes, but does not add any subprocesses to system service. Those conmon processes die in five minutes.

Running podman-remote exec foo /nonesuch, however, results in two new conmons plus a zombie:

fedora    341795  0.0  0.1  81428  2340 ?        Ssl  16:50   0:00 /usr/bin/conmon --api-version 1 -c 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 -u c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa -r /usr/bin/crun -b /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa -p /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/exec_pid -n foo --exit-dir /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/exit --full-attach -s -l none --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/oci-log -e --exec-attach --exec-process-spec /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/exec-process-141566268 --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/fedora/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg --exec --exit-command-arg c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa --exit-command-arg 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 --exit-delay 300
fedora    341799  0.0  0.0      0     0 ?        Z    16:50   0:00  \_ [3] <defunct>
fedora    341800  0.0  0.0  81428   368 ?        S    16:50   0:00  \_ /usr/bin/conmon --api-version 1 -c 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 -u c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa -r /usr/bin/crun -b /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa -p /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/exec_pid -n foo --exit-dir /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/exit --full-attach -s -l none --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/oci-log -e --exec-attach --exec-process-spec /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa/exec-process-141566268 --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/fedora/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg --exec --exit-command-arg c0aa049c2d0492d1c5801cf5bf61af37a446afcbee06c89fb260528f32bb70aa --exit-command-arg 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 --exit-delay 300

Those, too, go away in five minutes.

Finally, after podman-remote exec foo /bin/rm /foo, slirpns becomes a zombie, the original conmons go away, and two new conmons stick around for five more minutes:

fedora    341075  0.0  2.7 1280780 51188 pts/0   Sl   16:40   0:00  \_ podman system service --timeout=0
fedora    341081  0.1  3.1 1356176 58676 pts/0   Sl   16:40   0:01  |   \_ podman system service --timeout=0
fedora    341129  0.0  0.0      0     0 pts/0    Z    16:40   0:00  |       \_ [slirp4netns] <defunct>
...
fedora    342198  0.0  0.1  81428  2320 ?        Ssl  16:56   0:00 /usr/bin/conmon --api-version 1 -c 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 -u f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b -r /usr/bin/crun -b /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b -p /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/exec_pid -n foo --exit-dir /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/exit --full-attach -s -l none --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/oci-log -e --exec-attach --exec-process-spec /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/exec-process-520419179 --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/fedora/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg --exec --exit-command-arg f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b --exit-command-arg 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 --exit-delay 300
fedora    342203  0.0  0.0  81428   376 ?        S    16:56   0:00  \_ /usr/bin/conmon --api-version 1 -c 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 -u f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b -r /usr/bin/crun -b /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b -p /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/exec_pid -n foo --exit-dir /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/exit --full-attach -s -l none --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/oci-log -e --exec-attach --exec-process-spec /home/fedora/.local/share/containers/storage/overlay-containers/1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588/userdata/f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b/exec-process-520419179 --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/fedora/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg --exec --exit-command-arg f9ec0e627e70663bd53f1df8d287d2417c596914f4518c189943d778d6a30d1b --exit-command-arg 1402ff57d189f52d3f504d9c305970d32af69a74a1522e98b4180d8f80818588 --exit-delay 300

I have no idea what this means, but I also have no idea what else to look at. Am giving up for the day.

edsantiago commented 3 years ago

Bad news. I had an f34 system to play with, decided to run a quick test. Problem reproduces first-attempt with podman-3.2.0-3.fc34 on 5.12.9-300.fc34.

edsantiago commented 3 years ago

Ugh. Two flakes just now in containers/podman#10618: ubuntu 2010 and 2104. Mechanism is different -- container receiving a bunch of signals, but no exec /nosuch anywhere -- but the symptom is suspiciously similar: container does not exit from a while file exists loop.

edsantiago commented 3 years ago

@mheon do you think containers/podman#10405 fixes this?

mheon commented 3 years ago

I don't think so. That only affects non-remote Podman - remote has unconditionally spawned these processes since the Podman 2.0 rewrite.

edsantiago commented 3 years ago

Still seeing this. podman-3.3.0-0.20.dev.git599b7d7.fc35

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

edsantiago commented 3 years ago

I think we haven't seen this in a while. Closing again with fingers crossed.

edsantiago commented 3 years ago

Ha ha. Seeing it again. podman-3.2.3-2.fc34 conmon-2.0.29-2.fc34 crun-0.20.1-1.fc34 containers-common-1-21.fc34 .12.10-300.fc34

Setup was: run root gating tests, first normal, then podman-remote; then run rootless gating tests, first normal then podman-remote. Test failed during podman-remote testing:

 ✗ podman exec - basic test
   (from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 213,
    in test file /usr/share/podman/test/system/075-exec.bats, line 33)
     `run_podman wait $cid' failed with status 126
   $ podman-remote rm --all --force
   $ podman-remote ps --all --external --format {{.ID}} {{.Names}}
   $ podman-remote images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
   <none>:<none> 74bc6ef15230
   quay.io/libpod/testimage:20210427 aadc32e2a626
   $ podman-remote run -d quay.io/libpod/testimage:20210427 sh -c echo L3VfCuziyBdrGjyFmC2vfFtlmcW5MLA6QiYILt56BhPIt9qEvg >/CIIdxXe0EBV2w2Hh9g3l;echo READY;while [ -f /CIIdxXe0EBV2w2Hh9g3l ]; do sleep 1; done
   463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c
   $ podman-remote logs 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c
   READY
   $ podman-remote exec 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c sh -c cat /CIIdxXe0EBV2w2Hh9g3l
   L3VfCuziyBdrGjyFmC2vfFtlmcW5MLA6QiYILt56BhPIt9qEvg
   $ podman-remote exec 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c /etc
   Error: open executable: Operation not permitted: OCI permission denied
   [ rc=126 (expected) ]
   $ podman-remote exec 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c /no/such/command
   Error: executable file `/no/such/command` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found
   [ rc=127 (expected) ]
   $ podman-remote exec 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c rm -f /CIIdxXe0EBV2w2Hh9g3l
   $ podman-remote wait 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c
   timeout: sending signal TERM to command ‘podman-remote’
   [ rc=124 (** EXPECTED 0 **) ]
   *** TIMED OUT ***
   # [teardown]
   $ podman-remote pod rm --all --force
   $ podman-remote rm --all --force
   Error: cannot remove container 463c29932c62a398e2ef6f894267ca3eb417d9a642522a501afd738f14ef536c as it could not be stopped: given PIDs did not die within timeout
   [ rc=125 ]
vrothberg commented 3 years ago

I'll take a look :vulcan_salute:

edsantiago commented 3 years ago

OK, I have a (slightly) better reproducer. It's exactly the same as the first comment, except, before doing so, you need to do this as root:

# podman system service -t 0 &>/dev/null &
# PODMAN=podman-remote bats /usr/share/podman/test/system/*exec.bats
...tests will pass...
# kill %1

That is: you need to run podman-remote exec tests as root. I don't know why, and I don't know how root-podman-exec can affect rootless-podman-exec. But there you go. Once you've run those tests as root, the rootless reproducer in comment 0 will start triggering.

EDITED TO ADD: podman-3.3.0-0.2.rc1.fc34 on 5.12.10-300.fc34, which is as modern as I can get.

vrothberg commented 3 years ago

Thanks, Ed! I'll be on PTO next week, so others may continue looking into this issue.

edsantiago commented 3 years ago

Still present in podman-3.3.0-1, both f33 and f34

vrothberg commented 3 years ago

I fail to reproduce on the latest main branch and on v3.3.0.

edsantiago commented 3 years ago

Still reproduces for me, podman-3.3.0-1.fc34.x86_64, on the very first try. Did you miss the part where you need to do magic steps as root (my Aug 5 comment), and then run the comment-zero reproducer as rootless?

vrothberg commented 3 years ago

Having another look now

vrothberg commented 3 years ago

Nope. I am unable to reproduce on the main branch :( @edsantiago, let's put our heads together when you're back.