argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.98k stars 5.47k forks source link

'archive already exist' issue when multiple apps use the same helm chart #8008

Open awasilyev opened 2 years ago

awasilyev commented 2 years ago

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

Describe the bug

We have several different apps deployed from the same helm chart and regularly getting errors like this:

time="2021-12-22T07:58:31Z" level=warning msg="Failed to force refresh application details: rpc error: code = Unknown desc = `kustomize build /tmp/git@gitlab.com_xxxxxx_devops_apps/xxxxxx-sandbox-gke-europe-west3-xxxx/apps/data-query-gateway --load-restrictor LoadRestrictionsNone --enable-helm` failed exit status 1: Error: accumulating resources: accumulation err='accumulating resources from '../../../base/apps/data-query-gateway/': read /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway: is a directory': recursed accumulation of path '/tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway': Error: failed to untar: a file or directory with the name /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway/charts/k8s-app-1.2.7.tgz already exists\n: unable to run: 'helm pull --untar --untardir /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway/charts --repo https://argo:D64mNn3f6ZU77UGm2YWy@gitlab.com/api/v4/projects/29393074/packages/helm/stable k8s-app --version 1.2.7' with env=[HELM_CONFIG_HOME=/tmp/kustomize-helm-837700893/helm HELM_CACHE_HOME=/tmp/kustomize-helm-837700893/helm/.cache HELM_DATA_HOME=/tmp/kustomize-helm-837700893/helm/.data] (is 'helm' installed?)

To Reproduce

Create a several kustomize-based application which are using same helm chart via helmCharts

Expected behavior

no warning during apply

Screenshots

If applicable, add screenshots to help explain your problem.

Version

$ argocd version
argocd: v2.2.1+122ecef
  BuildDate: 2021-12-17T01:31:40Z
  GitCommit: 122ecefc3abfe8b691a08d9f3cecf9a170cc8c37
  GitTreeState: clean
  GoVersion: go1.16.11
  Compiler: gc
  Platform: linux/amd64

Logs

see above
adrianlyjak commented 2 years ago

Has anyone figured out any workarounds for this? The error eventually goes away, but it is somewhat frustrating to slow down the sync cycle. I have an applicationset that creates an application for every directory in a git repository. Those directories all import some shared helm charts. The error is very intermittent, but when it occurs, it seems like argo doesn't make another attemp at building the kustomization for a while. (5-10 minutes)

adrianlyjak commented 2 years ago

Committing the entire /charts directory to source control seems to prevent helm from trying to download the chart, which works around the issue here with ArgoCD.

Seems like maybe this bug is more of an upstream issue in kustomize or even helm. I have a repro of the bug here https://github.com/adrianlyjak/kustomize-bugs

lfdominguez commented 1 year ago

Affecting to me and is really annoying.

nice-pink commented 1 year ago

Combined with argo-cd notifications on Unknown State of apps, this is a huge pain, as it gets triggered several times per hour.

gaeljw commented 1 year ago

Having the same issue (ArgoCD 2.7.9, Kustomize v5).

Storing the Helm chart files in Git is not an option to me, it would be painful to manage any time there's an update to the Helm chart.

gaeljw commented 1 year ago

I guess this is a concurrency issue: if apps were "built" one by one, there would be no issue?

PickledChris commented 1 year ago

I've worked around this by adding

helmGlobals:
  chartHome: kube_prometheus_8_15_7_charts/

to my Kustomise file.

The issue is that the chart is locally cached under charts/ so ArgoCD uses that version. Forcing it to use a different cache directory, specific to that version of the chart you're depending on, will solve it

gaeljw commented 1 year ago

To give a concrete example of a situation where this can happen and for which no workaround has been identified so far:

base/kustomization.yaml <-- has a helmChart reference
overlays/overlay1/kustomization.yaml
overlays/overlay2/kustomization.yaml

The Kustomize overlays each referring to the base as usual with Kustomize.

Overlay1 and 2 each being referenced in an ArgoCD application (via an ApplicationSet and git generator in my case but don't think that matters).

awiesner4 commented 1 year ago

We ran into this issue, as well. It's a pure race condition. The solution, at this time, is to vendor the chart locally in a centralized location next to your kustomization.

You can actually recreate the .tgz already exists error easily locally with just kustomize.

  1. Clear out the charts folder in your kustomize base
  2. Run kustomize build on overlay1
  3. As fast as possible after step 2, run kustomize build on overlay2

If you catch it fast enough, when running kustomize build on overlay2, the charts directory and associated chart .tgz file hasn't been pulled by the overlay1 kustomize build, resulting in overlay2 also attempting to pull. However, by the time overlay2 attempts to write the file to disk, the overlay1 process has written the chart to disk, so it already exists.

As mentioned above, we vendored our chart locally and pointed our base kustomization to it with:

helmGlobals:
  chartHome: ../../.helm/<chart>

Now, since the chart exists in the repo, ArgoCD won't completely clear it out, and the kustomize build process won't have to pull it.

It's also easy to manage your vendored charts with something like vendir

gaeljw commented 1 year ago

I've opened an issue at Kustomize to track this: https://github.com/kubernetes-sigs/kustomize/issues/5271

Feel free to comment there to raise awareness.

From my experience, it takes very long time to get things moving in Kustomize though. I would love if ArgoCD could somehow workaround it by adding maybe some small delay between apps.

gaeljw commented 1 year ago

FYI in my case, the workaround is to use the new "progressive sync" feature of ApplicationSets so that there's no concurrency for these cases.

EDIT: surprisingly, even with a progressive sync of one by one, I still get the error from time to time :'(

MrSaints commented 1 year ago

Encountered this recently, but turned out it was non-consequential / a red herring.

The real issue was due to invalid YAML in our Helm values defined in our kustomization.yml. After we fixed it, the app refreshed, and sync'd appropriately.