argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.77k stars 5.42k forks source link

argocd-repo-server does not implement a process reaper, results into many zombie processes. #8689

Open jkroepke opened 2 years ago

jkroepke commented 2 years ago

Summary

If you run a container without an init process (pid 1) which would normally reap zombie processes, you could well end up with a lot of zombie processes and eventually exhaust the max process limit on your system.

See also https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

Motivation

If you move from an existing helm based eco system to ArgoCD, you may still depends on helm plugins like helm-secrets.

helm-secrets is often used with gpg eco system, which starts a gpg-agent as a deamon. In daemon mode, the process will do a double fork and gets adopted by pid 1 If the agents gets kills, it will be staled as zombie process.

ps aux ``` ``` I have no name!@argocd-repo-server-54b5888987-2k2n4:/home/argocd$ helm plugin list NAME VERSION DESCRIPTION secrets 3.12.0 This plugin provides secrets values encryption for Helm charts secure storing I have no name!@argocd-repo-server-54b5888987-2k2n4:/home/argocd$ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 1000310+ 1 0.2 0.2 788712 43076 ? Ssl 10:16 0:00 argocd-repo-server --redis argocd-redis:6379 --logformat text --loglevel info 1000310+ 16 0.0 0.0 0 0 ? Z 10:16 0:00 [gpg-agent] 1000310+ 17 0.0 0.0 78272 1292 ? Ss 10:16 0:00 gpg-agent --homedir /app/config/gpg/keys --use-standard-socket --daemon 1000310+ 252 0.0 0.0 0 0 ? Z 10:17 0:00 [gpg-agent] 1000310+ 255 0.0 0.0 0 0 ? Zs 10:17 0:00 [gpg-agent] 1000310+ 280 0.0 0.0 0 0 ? Z 10:17 0:00 [gpg-agent] 1000310+ 283 0.0 0.0 0 0 ? Zs 10:17 0:00 [gpg-agent] 1000310+ 284 0.0 0.0 0 0 ? Z 10:17 0:00 [gpg-agent] 1000310+ 292 0.0 0.0 0 0 ? Zs 10:17 0:00 [gpg-agent] 1000310+ 449 0.0 0.0 0 0 ? Z 10:17 0:00 [gpg-agent] 1000310+ 450 0.0 0.0 0 0 ? Z 10:17 0:00 [gpg-agent] ... ```

More context: https://github.com/jkroepke/helm-secrets/issues/200

Proposal

Using a golang library like https://github.com/ramr/go-reaper

OR

For building containers, using an external "init" program like https://github.com/krallin/tini

jannfis commented 2 years ago

Actually, Argo CD uses tini from its entrypoint: https://github.com/argoproj/argo-cd/blob/master/entrypoint.sh

jkroepke commented 2 years ago

As you already mention it here, there is a de-sync between upstream and the helm charts.

Do you consider to that ENTRYPOINT ["/usr/local/bin/entrypoint.sh"] inside the Dockerfile?

Then helm chart just switch from command to args in inherited the upstream entrypoint.

cjc7373 commented 1 year ago

Other components can also have this problem, like cmp-server, as reported in #13026 Maybe we can recommend to use tini in https://argo-cd.readthedocs.io/en/stable/operator-manual/config-management-plugins/#register-the-plugin-sidecar

blakepettersson commented 1 year ago

This is done with #12707 right?

metacoma commented 6 months ago

encountered the same problem with kcl plugin

$ kubectl -n argocd exec -it `kubectl -n argocd get pod -l app.kubernetes.io/component=repo-server -o name` -c my-plugin -- ps auxw
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
999            1  0.0  0.5 5535156 85760 ?       Ssl  18:42   0:00 /var/run/argocd/argocd-cmp-server
999           62 21.6  0.0      0     0 ?        Z    18:44   0:30 [kclvm_cli] <defunct>
999           63  0.0  0.0      0     0 ?        Z    18:44   0:00 [kclvm_cli] <defunct>
999          160  0.5  0.0      0     0 ?        Z    18:45   0:00 [kclvm_cli] <defunct>
999          161  0.0  0.0      0     0 ?        Z    18:45   0:00 [kclvm_cli] <defunct>
999          205  0.6  0.0      0     0 ?        Z    18:45   0:00 [kclvm_cli] <defunct>
999          206  0.0  0.0      0     0 ?        Z    18:45   0:00 [kclvm_cli] <defunct>
999          250  0.6  0.0      0     0 ?        Z    18:45   0:00 [kclvm_cli] <defunct>
999          251  0.0  0.0      0     0 ?        Z    18:45   0:00 [kclvm_cli] <defunct>
999          288  0.0  0.0   7064  2816 pts/0    Rs+  18:47   0:00 ps auxw