Enapter / charts

Enapter Helm Charts
MIT License
48 stars 49 forks source link

Zombi processes from readiness / liveness probes #50

Open viceice opened 2 years ago

viceice commented 2 years ago

I've got a lot zombi processes.

       0 2382791  0.8  0.2 712820  8716 ?        Sl   May17  88:11 /var/lib/rancher/k3s/data/8c2b0191f6e36ec6f3cb68e2302fcc4be850c6db31ec5f8a74e4b3be403101d8/bin/containerd-shim-runc-v2 -namespace k8s.io -id 21da19fa6f824bc4dd21aafe9148d07e95886390c7ef9caad10dcb181b585f58 -address /run/k3s/containerd/containerd.sock
   65535 2382814  0.0  0.0    972     4 ?        Ss   May17   0:00  \_ /pause
       0 1523052  1.5  0.6 648720 23988 ?        Ssl  May19  96:49  \_ keydb-server 0.0.0.0:6379
       0 2207040  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0 2398171  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0 2419093  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0 2921360  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0 2921383  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_liveness_l] <defunct>
       0 3941935  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_liveness_l] <defunct>
       0 3941970  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0 3942325  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0  517206  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_readiness_] <defunct>
       0  517224  0.0  0.0      0     0 ?        Z    May20   0:00      \_ [ping_liveness_l] <defunct>
       0 1082427  0.0  0.0      0     0 ?        Z    May21   0:00      \_ [ping_readiness_] <defunct>
       0 1292829  0.0  0.0      0     0 ?        Z    May21   0:00      \_ [ping_readiness_] <defunct>
       0 3612252  0.0  0.0      0     0 ?        Z    May21   0:00      \_ [ping_readiness_] <defunct>
       0 3999899  0.0  0.0      0     0 ?        Z    May22   0:00      \_ [ping_readiness_] <defunct>
       0  316962  0.0  0.0      0     0 ?        Z    May22   0:00      \_ [ping_readiness_] <defunct>
       0 1221761  0.0  0.0      0     0 ?        Z    May22   0:00      \_ [ping_readiness_] <defunct>
       0 2383088  0.0  0.0      0     0 ?        Z    May22   0:00      \_ [ping_readiness_] <defunct>
       0 2770818  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 2899448  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 4044700  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0  235003  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 1007972  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 1203442  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 1203464  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_liveness_l] <defunct>
       0 1203886  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_liveness_l] <defunct>
       0 1203888  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 1204235  0.0  0.0      0     0 ?        Z    May23   0:00      \_ [ping_readiness_] <defunct>
       0 2429265  0.0  0.0      0     0 ?        Z    06:33   0:00      \_ [ping_readiness_] <defunct>
       0 2451119  0.0  0.0      0     0 ?        Z    06:42   0:00      \_ [ping_readiness_] <defunct>
       0 2466469  0.0  0.0      0     0 ?        Z    06:49   0:00      \_ [ping_readiness_] <defunct>
       0 2557980  0.0  0.0      0     0 ?        Z    07:32   0:00      \_ [ping_readiness_] <defunct>

values.yml

persistentVolume:
  enabled: true
  storageClass: local-path
  size: 1Gi

resources:
  requests:
    memory: 64Mi
  limits:
    memory: 256Mi

loadBalancer:
  enabled: true
  extraSpec:
    externalTrafficPolicy: Local
    loadBalancerIP: 1.2.3.4

existingSecret: some-secret
Antiarchitect commented 2 years ago

Couldn't track this on baremetal. Could you please confirm on other platforms like minikube or kind?

viceice commented 2 years ago

seeing this on k3s

Antiarchitect commented 2 years ago

What k3s version do you use? I see some possibly related issues in k3s project: https://github.com/k3s-io/k3s/issues/2722

viceice commented 2 years ago

I use v1.23.6+k3s1, so i don't think it's the containerd issue. I also run k3s on plain ubuntu 20.04 virtual maschines.

thejan2009 commented 2 years ago

Found the same bug, but with an inhouse chart. Removing exec probes in favor of anything else fixed the problem. It seems related to bottom notices at https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes although it doesn't make sense as k3s uses containerd instead of docker-shim.

Root cause seems to be that probe invocation processes are children of the main pod process which needs to reap them once they shut down.

Edit: also k3s v1.23.6+k3s1 @ Ubuntu 20.04.

viceice commented 2 years ago

For my own images i use dumb-init as entrypoint, with will do this job very well

viceice commented 1 year ago

I've now build a custom image which starts dumb-init before keydb, so hopefully no more zombies 🤞