Closed jefflantz closed 6 months ago
@MisterMX @janwillies any ideas ?
We haven't found anything that helped indicating the cause of this issue. However, we also did not encounter this problem in our system.
This could be caused by the argocd api clients which never get closed. By keeping them up forever they might add more and more data to the memory. I would suggest you to start looking into this.
Adding some additional evidence to what @v0lkc says.
I added profiling to v0.5.0 & v0.6.0 and the picture is quite similar. After around 20 mins running the controller with a sync-interval of 5s and 5 MRs, the size of the buffers of the http client more then doubled.
What @v0lkc send me in slack was also the .Close()
within argocd's http client are only closed on error.
https://github.com/argoproj/argo-cd/blob/0c8bc1d61e8c9501c5aaabb2aafecc20aa43e1bb/util/http/http.go#L147-L151
in the DumpReponse()
example its proposed differently.
Maybe we should open an issue over there and close our clients?
connector.Disconnect()
might be usefull for that.
If somebody has time, feel free to open a PR and assign me as reviewer.
Hi @MisterMX I sent a PR for this fix, please have a look and see if it really solve the issue. 😃
What happened?
For context, my control plane exists in an environment that cannot directly hit our ArgoCD server endpoints, so I run a deployment on the cluster running ArgoCD to run the provider indirectly. What I've observed is that the memory allocated to this pod is increasing at roughly 1.3GB/day, consistent across four environments. I've checked using
kubectl top pods
and our stored metrics graphs:This is leading to pods being evicted for putting MemoryPressure on the underlying node.
How can we reproduce it?
First I manually create a ControllerConfig and Provider
There is a secret
argocd-credentials
with a token to ArgoCD at the keyauthToken
, referenced by this ProviderConfigIn the cluster running ArgoCD, I'm using the following deployment spec for the controller:
What environment did it happen in?
Crossplane version: 1.14.3-up.1 (note this is running in Upbound SaaS) Crossplane Provider argocd version: v0.5.0
The deployments are running in EKS, and I've seen it on clusters running EKS versions v1.24 and v1.28.