argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.52k stars 5.34k forks source link

Umbrealla helm chart approach - performance issue #8728

Open mirkoszy opened 2 years ago

mirkoszy commented 2 years ago

Checklist:

Describe the bug

Hi,

I am using the umbrella chart approach for my deployments but I have performance issue. Within single application I have multiple (more than 3) dependencies (common app helm chart, and other dependencies, eg. config connector resources). All applications have [almost] the same dependencies. The total number of Helm Charts used by my apps is around 10-15. I have hundreds of applications (currently about 300, in the near future - 800).

First installation takes few hours. Repo server consumes huge amount of memory and CPU. Repo Server is scaled up to many replicas (tested with 2, 5, and 10 - when 10 replicas used, performance is a little better). Repo server has no limits for CPU and memory. Errors that occur during this process are: Unknown current sync status, helm dependency build failed timeout after 5m, DeadlineExceeded desc = context deadline exceeded, Manifest generation error (cached)

Every refresh takes a lot of seconds (I had to increase the "ARGOCD_EXEC_TIMEOUT" to 5 minutes because default timeout 90s was not enough)

Changes in many applications at the same time leads to "Unknown" current sync status in almost all of the applications. And again it takes a lot of time to refresh/sync all apps.

I am using single helm repository. All dependant charts are located there. I added this repo as a stable repository. [1]

To Reproduce

Hundreds of applications with many dependencies in Chart.yaml

Expected behavior

I agree that First Helm dependency build can take a lot of time but every change introduced to application (except the Chart dependencies versions) should be fast and shouldn't impact on repo server performance.

Screenshots

image

Resource consumption:

image

Version

v2.2.5+8f981cc

[1] https://argo-cd.readthedocs.io/en/stable/faq/#argo-cd-cannot-deploy-helm-chart-based-applications-without-internet-access-how-can-i-solve-it

crenshaw-dev commented 2 years ago

We could add a config option to skip git clean for Helm repos with lots of large dependencies.

mirkoszy commented 2 years ago

I will add some overview of our conversation on Slack:

I asked about it on Slack. I made some debug and seemed that the triggers of the issue are:

@crenshaw-dev I checked if skipping git clean solve this issue. And fortunately, YES. I will try to prepare some PR with the solution.

mirkoszy commented 2 years ago

Not so good - I made some tests and found the following issues:

Both issues are caused by not cleaning built charts dependencies. But performance issue has gone!

crenshaw-dev commented 2 years ago

Unfortunately output is hard to handle.

Is helm dependency update expensive? If not, we could skip helm dependency list and always update before template.

mirkoszy commented 2 years ago

Helm dependency update is very expensive - time and resource consuming.

I implemented my idea - run helm dependency list, then helm dependency update if needed. And after that helm template. It is deployed on testing environments. Works fine so far.

I will share you my draft code.

mirkoszy commented 2 years ago

https://github.com/mirkoszy/argo-cd/commit/a5b623aee0235412a3556f7e7075c855a4e133b8

crenshaw-dev commented 2 years ago
    if (strings.Contains(dependencyStatus, "WARNING")) || (strings.Contains(dependencyStatus, "wrong")) {
        return true
    }

We could write a unit test that would break if the Helm behavior changes. Probably still want to feature-gate it, but I'm all for it if it works.

kklimonda-fn commented 2 years ago

Adding here, that this does not only affect helm charts but in fact every sufficiently large monorepo-based deployment with multiple applications - I'm hitting this issue with a custom config management plugin that pulls dependencies which are meant to be shared between all applications, but are then deleted as argocd refreshes repository for the next application.

An option to disable git clean or even making it possible to changing git clean -fdx into git clean -fd (not cleaning ignored files) would at least partially alleviate this problem - one would still have to redownload all dependencies on new commit to the repo with applications, but at least that could be done amortized across a group of applications

Keeping cache alive between repo updates would be an interesting problem to solve too - this would probably need to be integrated into plugin anyway, and now could be done as part of init if at least repo url and reference are available as environment variables.

crenshaw-dev commented 2 years ago

@kklimonda-fn a hacky cache solution would be to mount a volume for your plugin to symlink to a dependencies directory before manifest generation.

grzegdl commented 2 years ago

Were you guys able to solve this in some way? We are hitting exact same issue. Setting up webhooks and manifest-generate-paths doesn't seem to help.

mirkoszy commented 1 year ago

Any update?

xmj commented 2 weeks ago

RedHat has documented a number of workarounds to this issue in https://access.redhat.com/solutions/6818871 (increasing timeouts, memory and CPU, limiting parallelism, etc) but none of them fixes this.

Have any of you found good workarounds to this?