crossplane-contrib / provider-argocd

Crossplane provider to provision and manage Argo CD objects
Apache License 2.0
68 stars 35 forks source link

Provider v0.7.0 memory leak in large-scale cluster #185

Closed abelhoula closed 1 month ago

abelhoula commented 1 month ago

What happened?

We have identified a performance bottleneck in the provider ArgoCD, particularly in a large scale environment. The system is experiencing a memory leak due to a large number of Kubernetes Secret objects. Profiling indicates that methods such as k8s.io/api/core/v1.(*Secret).Unmarshaland caching operations like k8s.io/client-go/tools/cache.(*Reflector).list.func1 have the highest memory usage. Notably, even with just one project resource, the memory usage spiked to 1 GB.

How can we reproduce it?

This issue can be reproduced in an environment with a large number of Kubernetes Secret objects. The high memory usage is observed when unmarshaling these secrets and during cache reflection operations.

What environment did it happen in?

Crossplane version: 1.14.6 Crossplane Provider argocd version: 0.7.0 profile001 image

Root Cause

The high memory usage is likely due to unmarshaling large amounts of Kubernetes Secret objects being pulled from an API server, which suggests that there may be numerous Secrets or that the Secrets themselves are large. Additionally, cache reflection operations appear to be involved, indicating that the client is actively watching or listing numerous resources, leading to increased memory use, potentially due to storing the objects in memory.

Recommended Fix

Please consider the following optimizations options:

abelhoula commented 1 month ago

@maximilianbraun @MisterMX Any recommendations on addressing this memory leak issue?

abelhoula commented 1 month ago

this was due to the huge nb of secrets in our clusters (secrets created by Trivy operator), so after the cleanup memory was back to normal.

maximilianbraun commented 1 month ago

@abelhoula the large number of secrets was in the cluster where the provider was running or argocd?

abelhoula commented 1 month ago

@maximilianbraun both the provider and argocd, more than 50k secrets. I think the same goes for crossplane as well, after the removal of trivy & cleanup of secrets everything is back to normal. https://github.com/crossplane/crossplane/issues/5272