containers / conmon-rs

An OCI container runtime monitor written in Rust
Apache License 2.0
183 stars 42 forks source link

Pod cannot be deleted due to missing container startup command #2242

Closed Bevisy closed 2 weeks ago

Bevisy commented 3 months ago

What happened?

using pod-config.json and container-config.json to create pod:

# cat pod-config.json
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "log_directory": "/tmp",
    "linux": {
    }
}

# cat container-config-nginx.json
{
  "metadata": {
      "name": "nginx-0"
  },
  "image":{
      "image": "docker.io/library/nginx:latest"
  },
  "command": [
      "top"
  ],
  "linux": {
  }
}

Then, we could find the container was created failed:

# crictl run container-config-nginx.json pod-config.json
FATA[0012] running container: creating container failed: rpc error: code = Unknown desc = create container: create result: internal/proto/conmon.capnp:Conmon.createContainer: Failed: child command exited with: 1: executable file `top` not found in $PATH: No such file or directory

At this point, the container process on the node becomes a zombie process, and the pod cannot be deleted.

      1   15487   15486    2552 pts/1      11037 Sl       0   0:00 /usr/bin/crio-conmonrs --runtime /usr/bin/crio-crun --runtime-dir /var/lib/containers/storage/overlay-containers/7d46c4f2908be02f02465923ca1aca87295e8872231dae236287fe69209fdec9/userdata --runtime-root /run/crun --log-level info --log-driver systemd --cgroup-manager systemd
  15487   15496   15496   15496 ?             -1 Ss       0   0:00  \_ /pause
  15487   15509   15486    2552 pts/1      11037 Z        0   0:00  \_ [3] <defunct>

What did you expect to happen?

Expect the container process to exit normally instead of becoming a zombie process.

How can we reproduce it (as minimally and precisely as possible)?

See what happened.

Anything else we need to know?

No response

CRI-O and Kubernetes version

```console conmonrs version: v0.6.3 ```

OS version

```console # On Linux: $ cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" $ uname -a Linux lima-crio 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux ```

Additional environment details (AWS, VirtualBox, physical, etc.)

nothing else
Bevisy commented 3 months ago

refer: https://github.com/cri-o/cri-o/issues/8272#issuecomment-2158040886

Bevisy commented 3 months ago

While investigating this issue, I also discovered that when I switch from crun to runc, zombie processes are not generated, and this issue does not occur. Related issue: https://github.com/containers/crun/issues/1482

saschagrunert commented 3 months ago

@Bevisy ah good point, I can reproduce the same with crun but not runc. I was going to find a way to fix it in conmon-rs but found no real solution yet. Maybe it has to be fixed in crun then :thinking: