chimera-linux / cports

Chimera ports collection
BSD 2-Clause "Simplified" License
176 stars 117 forks source link

podman: cannot stop rootless container created with --pid host #1718

Open JamiKettunen opened 6 months ago

JamiKettunen commented 6 months ago

This is my last major blocker for adding distrobox into the repo, reproducible with:

podman create \
--name test --ipc host --network host \
--privileged --security-opt label=disable \
--user root:root --pid host \
--volume /:/run/host:rslave \
--volume /dev:/dev:rslave \
--volume /sys:/sys:rslave \
--volume /tmp:/tmp:rslave \
--volume "$HOME":"$HOME":rslave \
--volume /etc/hosts:/etc/hosts:ro \
--volume /etc/localtime:/etc/localtime:ro \
--volume /etc/resolv.conf:/etc/resolv.conf:ro \
--ulimit host --annotation run.oci.keep_original_groups=1 \
--mount type=devpts,destination=/dev/pts \
--userns keep-id \
alpine:latest \
sleep infinity

podman start test
podman ps
podman --log-level debug stop test

with the last command output containing:

DEBU[0000] Sending signal 15 to container ... 
DEBU[0010] Timed out stopping container ... with SIGTERM, resorting to SIGKILL: given PID did not die within timeout 
WARN[0010] StopSignal SIGTERM failed to stop container test in 10 seconds, resorting to SIGKILL 
DEBU[0010] Sending signal 9 to container ... 
Error: given PID did not die within timeout

workaround to get back to a stopped state: pkill conmon

fwiw attempting the same on Void musl SIGTERM is logged but it appears to go straight for SIGKILL immediately afterward which does work:

DEBU[0000] Stopping container ... (PID ...) 
DEBU[0000] Sending signal 15 to container ... 
DEBU[0000] Sending signal 9 to container ... 
DEBU[0000] Cleaning up container ... 
nekopsykose commented 6 months ago

as a first step, could you juggle around the versions of stuff here to make podman etc match (void is 4.9.3)? maybe this is a 5.0 regression

q66 commented 6 months ago

i can reproduce this, interestingly using crun kill (or runc kill, whichever runtime you are using) works as expected

q66 commented 6 months ago

another thing: using crun kill works, but crun kill --all (which podman seemingly invokes) does not, both immediately exit but only one results in the container being stopped

q66 commented 6 months ago

when you look at the container status:

$ cat /run/user/1000/crun/4b92c2aeb85d3299596a764de4abf8886b5e3405c9503ea767779ff0c07957b9/status
{
    "pid": 10309,
    "process-start-time": 1448781,
    "cgroup-path": "",
    "scope": "",
    "intelrdt": "",
    "rootfs": "/home/q66/.local/share/containers/storage/overlay/0f34c6c12f7880fa34ed8be8ce062f8db2937f5f9ffea082f20d3466c41de5cf/merged",
    "systemd-cgroup": false,
    "bundle": "/home/q66/.local/share/containers/storage/overlay-containers/4b92c2aeb85d3299596a764de4abf8886b5e3405c9503ea767779ff0c07957b9/userdata",
    "created": "2024-04-01T18:02:21.605445Z",
    "owner": "root",
    "detached": true,
    "external_descriptors": "[\"/dev/null\",\"pipe:[208100]\",\"pipe:[208101]\"]"
}

the cgroup path is empty, so an attempt to do a cgroup kill will never happen

JamiKettunen commented 5 months ago

Included a workaround for this in https://github.com/chimera-linux/cports/pull/1726/commits/f0fd027a80c51a410242eaa7c4365ea88eba76c9 of the distrobox PR, causes some strange additional spam but oh well

DEBU[0000] Sending signal 15 to container cdb25d06b35b925c6e820e2260096d6ca2f9131ea6d97097fd34639a25609e39 
DEBU[0000] Sending signal 9 to container cdb25d06b35b925c6e820e2260096d6ca2f9131ea6d97097fd34639a25609e39 
ERRO[0000] container not running                        
DEBU[0000] Cleaning up container cdb25d06b35b925c6e820e2260096d6ca2f9131ea6d97097fd34639a25609e39 
DEBU[0000] Network is already cleaned up, skipping...   
WARN[0000] freezer not supported: openat2 /sys/fs/cgroup/cdb25d06b35b925c6e820e2260096d6ca2f9131ea6d97097fd34639a25609e39/cgroup.freeze: no such file or directory 
WARN[0000] lstat /sys/fs/cgroup/cdb25d06b35b925c6e820e2260096d6ca2f9131ea6d97097fd34639a25609e39: no such file or directory 
DEBU[0000] Successfully cleaned up container cdb25d06b35b925c6e820e2260096d6ca2f9131ea6d97097fd34639a25609e39 
q66 commented 5 months ago

the workaround looks reasonable, but it'd be nice to ask upstream what they think as we might be missing something

na-sa-do commented 1 month ago

This might have been semi-fixed in a recent update to conmon? I ran into this bug so I went to report it, but (after finding the issue, before posting) thought to double-check if there were any outstanding updates first; there was one to conmon, and after updating it I can properly stop distrobox containers, although it still gives the error messages. I didn't note which version of conmon I had before, though.

nekopsykose commented 1 month ago

the repro in the issue body still hangs if you remove the kill-all-only-rootful.patch from podman, so it seems the same as before