Closed krausemi closed 1 year ago
Could be related, but I think this one only applies when requests time out: https://github.com/argoproj/argo-cd/issues/9180
As I see the sidecar has directly the cmp server as an entrypoint:
command: [/var/run/argocd/argocd-cmp-server]
If the entrypoint is an init process (such as tini), that takes care of cleaning up the leftover processes. That's the complete point of tini.
@krausemi Could you please try adding tini to your sidecar container, either setting it as ENTRYPOINT in the dockerfile or using it as the command
for the sidecar, then move the argocd-cmp-server
to the args
section? This way tini will have pid0, and will kill the zombies, as that's its intended puprose.
Thanks for the hint @gczuczy.
I'll give it a try and post the results here.
Thanks for the hint @gczuczy.
I'll give it a try and post the results here.
Here's a good reading on the pid1 init's responsibilities: https://github.com/krallin/tini/issues/8
Okay, I've tried the solution with tini and it works as it should - there are no more zombie processes. :) I'll also update the parent issue within the argocd-vault-plugin project, so that they can adapt their documentation.
Thank you for your support!
What did I change to make it work?
Because of the fact, that the used argocd container image already has tini installed, I've just added the entrypoint to the used Dockerfile and handed over the arguments "/var/run/argocd/argocd-cmp-server" from within the extraContainers parameters inside the argocd values file.
ARG ARGOCD_VERSION=2.5.7
ARG AVP_VERSION=1.13.1
FROM registry.access.redhat.com/ubi8 as download
RUN mkdir /custom-tools/ && \
cd /custom-tools/ && \
curl -L https://github.com/argoproj-labs/argocd-vault-plugin/releases/download/v${AVP_VERSION}/argocd-vault-plugin_{AVP_VERSION}_linux_amd64 -o argocd-vault-plugin && \
chmod +x argocd-vault-plugin
FROM quay.io/argoproj/argocd:v${ARGOCD_VERSION} as target
COPY certs.crt /etc/ssl/certs/
COPY --from=download /custom-tools/argocd-vault-plugin /usr/local/bin/
ENTRYPOINT [ "/usr/bin/tini" ]
extraContainers:
- name: avp-helm
args:
- /var/run/argocd/argocd-cmp-server
image: <internal-registry>/path/to/image/argocd-vault-plugin-sidecar:<internal tag>
securityContext:
runAsNonRoot: true
runAsUser: 999
volumeMounts:
- mountPath: /var/run/argocd
name: var-files
- mountPath: /home/argocd/cmp-server/plugins
name: plugins
- mountPath: /tmp
name: tmp-dir
- mountPath: /home/argocd/cmp-server/config/plugin.yaml
subPath: avp-helm.yaml
name: cmp-plugin
Introduction
We are using Argo CD in combination with the argocd-vault-plugin (https://github.com/argoproj-labs/argocd-vault-plugin). The plugin has been installed via the sidecar-container way and is working as it should. But somehow it looks like the argocd-cmp-server is not correctly terminating the executed bash commands after execution. They're still there... as zombies.
Bug description
After the execution of the defined generate process for helm charts (avp-helm) the executed bash sub-processes are stuck in state "defunct" instead of being terminated.
The number of zombie processes is increasing rapidly and after some hours the process limits within the underlying node gets reached. By reaching the limit the node itself is unusable.
Logs
Using the --verbose-sensitive-output parameter did not log more than the logs above (or I did something wrong :D).
Installation setup
Used Dockerfile for image creation
Used values for argocd-vault-plugin sidecar installation
Used config for the avp-helm-sidecar-container
ArgoCD example application
How to reproduce
kubectl exec -it -n <namespace> <pod name> -c <container name (in my case avp-helm)> -- bash -c "ps -ef | head"
Expected behavior
I would expect that the sub-processes of the argocd-cmp-server, which are spawned within the avp-helm-sidecar-container, will be terminated after execution instead of being zombies.
Workaround
For a temporary workaround we implemented a cronjob which restarts the argocd-repo-server-pod (which contains the avp-helm-sidecar-container) on a daily basis. Therefore the spawned zombie processes will be killed and the node itself will not reach its process limit.