Open tebaly opened 2 months ago
I can't reproduce this issue. Can you tell me how to reproduce it in detail?
this is my guess. Please configure HorizontalPodAutoscaler so that it changes the number of replicas automatically - if Argo CD does not react to these automatic changes in the cluster, then there is no problem . a little later I will set these timeouts to 86400 in my cluster and show the logs
argocd-application-controller-0
ARGOCD_RECONCILIATION_TIMEOUT : 30000s ARGOCD_APPLICATION_CONTROLLER_SELF_HEAL_TIMEOUT_SECONDS : 30000
Previously I installed using Helm - Now I changed the timeouts in the variables file and
The controller has rebooted with new settings. It still does something endlessly and takes up as much CPU time as it can.
level=info msg="Update successful" application=
level=info msg="Updated health status: Healthy -> Progressing"
level=info msg="Refreshing app status (controller refresh requested), level (1)
level=info msg="Comparing app state ...
level=info msg="GetRepoObjs stats" application=
There are many such lines and they appear in batches with breaks of a couple of minutes, I think the processor can't go faster, it's busy with something...
With the above settings I expect the controller to be idle. At least the CPU load should be lower - but nothing has changed
it could be that he's been storing tasks in a queue for months and is now trying to complete them all. That is, there's a queue of tens/hundreds of thousands of tasks. And he'll continue to complete them all?
Now it continues to do what I described above and without stopping it grows the message log
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
labels:
app.kubernetes.io/component: server
app.kubernetes.io/instance: argocd
app.kubernetes.io/name: argocd-cm
app.kubernetes.io/part-of: argocd
data:
...
resource.customizations.ignoreResourceUpdates.all: |
jsonPointers:
- /status
- /metadata
this didn't change anything either
How to debug this - Refreshing app status (controller refresh requested)
I believe there is a problem with the
spec.syncPolicy.automated.selfHeal
option in Argo CD. The current implementation seems to ignore the configured timeouts and triggers checks more frequently than expected, leading to unnecessary resource consumption.Details:
I have configured the following timeouts:
My expectation is that the controller will not attempt to change or check the cluster state within these timeout periods. Specifically, I expect to see no synchronization attempts in the controller logs for at least five minutes. However, this is not happening. Instead, the controller constantly attempts to check and synchronize the state, as shown in the logs below:
Hypothesis:
It appears that the controller might be responding to events in the cluster and ignoring the configured timeouts. One possible trigger for these frequent checks could be automatic changes in the number of replicas managed by a HorizontalPodAutoscaler, which, in this case, seems unnecessary.
Request:
I would like the ability to disable all checks and events except for the regular timeout checks. The controller should respect the configured timeouts and not attempt to check or synchronize the cluster state during these intervals unless explicitly required.
Thank you for looking into this issue.