argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
16.85k stars 5.11k forks source link

post install for dex helm chart does not work #4288

Open jetersen opened 3 years ago

jetersen commented 3 years ago

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

Describe the bug

This post-install job does not get created when installing a clean version of dex with argocd using v1.7.4+f8cbd6b https://github.com/helm/charts/blob/master/stable/dex/templates/job-grpc-certs.yaml

I ended up using helm --namespace dex install -f values.yaml dex stable/dex for the initial installation.

To Reproduce

create an helm chart app for dex.

Expected behavior

that the dex deployment is able to get grpc certs from the job.

Screenshots

N/A

Version

v1.7.4+f8cbd6b

Logs

N/A

jessesuen commented 3 years ago

We are supposed to map post-install to PostSync hooks. Possibly a regression?

https://argoproj.github.io/argo-cd/user-guide/helm/#helm-hooks

jetersen commented 3 years ago

I think the fact that post-install maps to post sync is a mistake. In the case of dex the pod needs these certs to spin up thus the sync will never finish.

kvaps commented 3 years ago

Hi, I have similar problem with running kubernetes-in-kubernetes, I have hook which generates configmaps wiih kubeconfig files, some components are rely on these configmaps, thus synchronization is never finishing.

UI and log are saying:

waiting for completion of hook batch/Job/kubernetes-kubeadm-tasks and 2 more hooks

but hooks are never executed: image

However helm is working fine.

kvaps commented 3 years ago

I think the fact that post-install maps to post sync is a mistake.

I agree, Helm documentations explains:

but argo thinks different:

Other words Helm does not wait unitl all resources become to healthy state.

jessesuen commented 3 years ago

We realize it's not a perfect mapping, but given that Argo CD applications do not have lifecycle information, it's not currently possible to have a perfect mapping.

jessesuen commented 3 years ago

FYI, I do sometimes recommend https://github.com/fluxcd/helm-operator when perfect helm installations are needed. Keep in mind, by doing this, it loses out on Argo CD project resource white/blacklisting, and kubectl apply semantics.

kvaps commented 3 years ago

@jessesuen you don't need any lifecycle information to run post-upgrade and post-install hooks after all the resources applied and before they become to Ready state unlike argocd's PostSync hook.

kingdonb commented 3 years ago

According to my report in adfinis-sygroup/helm-charts#143, PostSync does not really wait for all created resources to become healthy. It is an obstacle there too, where migrations run in a post-install hook, similar to the report in this issue.

There is a sentry-relay deploy which will not become healthy until post-install hooks have run. I'm able to exec into sentry-web pod and run sentry upgrade by hand, which does result in a healthy sentry-relay deploy, but three other created pods from the same Application are still in CrashLoopBackOff. Nonetheless, it's enough to trick ArgoCD into thinking it's time to run (a number of other needed) post-install hooks, they trigger and succeed, ultimately making the whole Application healthy.

This does not appear to be completely correct behavior according to the description provided here, in any case.

  • PostSync - Executes after all Sync hooks completed and were successful, a successful application, and all resources in a Healthy state.

This appears in conflict with my observations, whether or not it is intended to perfectly emulate the behavior of helm itself. What's worse, if it is fixed such that Argo actually behaved exactly according to this description, it seems likely it might not even be possible to induce those post-install/post-upgrade hooks to run at all via Argo.

kingdonb commented 3 years ago

Maybe I'm wrong about that... I did some more digging to understand why this might work as it does:

https://github.com/sentry-kubernetes/charts/blob/develop/sentry/templates/deployment-relay.yaml#L92

deployment-relay.yaml describes a health check (liveness/readiness) but sessions-consumer, snuba-outcomes-consumer, and snuba-replacer all might not. So there is a reasonable way to understand how ArgoCD would consider the application Healthy even though all three of those are still in CrashLoopBackOff.

So ArgoCD waits for relay, but not the others. It is at least potentially consistent with the way PostSync has been described, but it definitely also conflicts with the way that helm post-install and post-upgrade hooks are used by (at least dex and sentry) charts.