Open crenshaw-dev opened 10 months ago
To avoid API breaking changes another suggestion could be:
status.resourcesGzip
and status.operationState.operation.sync.resourcesGzip
With this approach the great majority of the users wouldn't be impacted as the new fields would just be used when the CRD limit is exceeded.
Yep, I like this. One question:
Check if the new CRD state will exceed the 1.5mb etcd limit
How would you propose to perform that check?
How would you propose to perform that check?
You have the computed status
field state that is about to be persisted isn't it? I was thinking about just check its size to drive the persisting logic.
You have the computed status field state that is about to be persisted isn't it?
Not necessarily. Some places, e.g. persisting operation state, calculate only a patch: https://github.com/argoproj/argo-cd/blob/15eeb307eb03191e7581d8e616072de4fd4b20e0/controller/appcontroller.go#L1250
Even if we know the full status field contents, I see a few potential problems:
1) you're missing the sizes of top-level keys metadata
, spec
, and operation
2) marshaling the status before every write operation could be a performance drag
I'd suggest a lightweight, configurable heuristic like "if it manages > N resources, compress."
I'd suggest a lightweight, configurable heuristic like "if it manages > N resources, compress."
Yes.. I like that too.
Related, we are looking to enable argo with a large scale of applications soon (5k+) and we're concerned about hitting GKE limits where any single resource type in etcd must be < 800MB. An option to always compress statuses, regardless of number of resources, would be nice.
Google documentation for reference, I assume other cloud vendors would have similar limits. https://cloud.google.com/kubernetes-engine/docs/concepts/planning-large-clusters
Any further updates on when ArgoCD will be able to implement the improvements?
Summary
Intuit had an app fail to sync when it hit ~3k resources managed in a single App. I believe the problem was that it attempted to update the sync status, which contained the status of all 3k resources, and we hit the k8s resource size limit.
We should provide more ways for the user to sacrifice certain features/conveniences to allow the resource to fit within the size limit. Ideas below.
Motivation
3000 really isn't that big a number.
Proposal