Open carlosrejano opened 1 month ago
I can confirm this seems to happen also in 1.11.7
and probably older versions, we sometimes have applications that stay OutOfSync
forever and would not sync with the RollingSync
option activated in our applicationset.
Today, we had such an occurence and there are a bunch of applications in "waves" preceding the one for the stuck application that have their status to Pending
although they have been healthy/synced for a while.
I forced the status to Progressing
for those and everything went back to normal and my stuck OutOfSync
application synced automatically.
I believe you might be right on other potential problems, with multiple commits.
Between the time we store the revisions here and the time we compare it here there could have been other commits/revisions changes when the sync is applied (particularly in the case of a mono repo as source for multi apps for the appset), so it will stay stuck in pending again.
I was wondering if @wparr-circle or @crenshaw-dev can confirm and if they have a take on that?
Like would there be way to just detect it's a later sync and progress the status anyway? Maybe this needs a similar feature as the metadata.generation
on the application sync status itself to detect if the "sync generation" is higher than the one we stored?
I might be completely off, but what if we overthinked all this?
Applicationset RollingSync
is ultimately meant to trigger sync on OutofSync application in a certain order right?
And this behavior can still be bypassed by manual operations at any point (I can sync any app I want manually).
So if an app is synced and healthy whatever the status of the app in the applicationset, its job is done isn't it? So should we care at all to check revisions or times, couldn't we just check the application syncstatus and healthstatus?
Here is an over simpified graph, we already have some of those transitions (waiting and progressing)
Indeed, it would be interesting to understand the strategy(ies) that should be followed for the progressive sync, in particular when the ApplicationSet spec is updated or that the targetRevision
resolves to a new sha during a rolling sync.
Personally, I mostly foresee 2 strategies
With those strategies in mind, what I don't get clear is the signification of ApplicationSet.status.ApplicationStatus[].TargetRevisions
.
Does it aim at reflecting:
My 2 cents is that to achieve the first strategy, we need to ensure that all applications are effectively healthy with the latest revision. If they are not healthy, we need to go, in order, through all the applications and sync them.
This means a high level algorithm like this:
latestTargetRevisions = getCurrentTargetRevision(appSet)
appSet.status = getAppSetApplicationStatuses(appSet)
for appStatus in appSet.status :
if appStatus.targetRevisions != latestTargetRevisions:
syncApp(appStatus.Name)
break // we only update one app at a time
if not appStatus.Ready || not appStatus.Healthy:
break // We hold until the app is ready and healthy
This makes me wonder why the appStatus targetRevision is not updated when the app status is Waiting or Pending. Shouldn't it always be up to date?
Strategy 2 is quite more complex as it requires to store a queue of updates to be performed (and wonder how to handle possible failing deployments (update worked for the first applications, but failed in the middle)
Well my point is that the revisions themselves unless I'm mistaken are not important at the applicationset level.
I believe we start using those in 2.12 because we were stuck in pending states in 2.11 sometimes, before we were using sync date/time comparaison, but the transition logic suffer the same problem in both, if someone do a manual sync during a rolling sync on one of the app or if other commit comes in (mono repo setup is highly subject to that), then the condition to get out of the pending state is never met.
To my understanding, the application controller is already handling all the follow-up there and will flag an application as outOfSync if it needs a sync. So the applicationset controller shouldn't care which revision or date, it should only care to trigger a sync on outofsync applications in a certain order.
i.e. I think our only problem is the state machine transitions and conditions.
And we already have a "queue" system, the applicationset application statuses is basically a backup of the queue between reconciliation. It goes like that:
One block is basically one reconcile of the applicationset. Pending is the state where a sync is triggered by adding an operation on the application, the application controller is responsible to execute the sync itself. So if at any stage the applicationset is updated or an application health or sync state change, it is reevaluated in each reconciliation so that part should already works.
I believe it's really only the transition conditions that should simply not take into account revision or time but just application outofsync, health and operation statuses.
I started a branch to test those simplifications but couldn't advance much in my calendar 😥
And also...it's highly possible that I completely misunderstood the logic and needs 😁
In my experience, progressive syncs goal is to reduce the risk of synchronising an application (i.e. propagating an update).
In this sense, I see that revisions matters (the immutable ones like commit sha or helm versions), to understand and control what is being applied/deployed and where.
Hum I don't think that's the role of argoCD, in your example:
Say commit A bumps a controller that adds a CRD and run a migration script, commit B drops support for the legacy behaviour. You want to ensure that commit A is successfully applied to all clusters before commit B is applied.
That sounds like you need something above argoCD like kargo.
The progressive sync in the applicationset only add logic on how to deploy applications relative to other applications, not between 2 commits for the same application.
In case of using git generators to deploy several applications, I would agree.
This said, when using cluster generators to deploy the same application into multiple clusters, I believe ApplicationSet converts themselves de-facto into an application deployment orchestrator.
Means it needs to have some mechanisms to control the progression of a release, similarly to what Deployments, StatefulSets or DaemonSet do.
This need is, IMO, increased with the use of indirect targetRevisions like HEAD
, main
or my-branch
which may point to different commit sha depending on the time at which the sync
action was requested
Describe the bug
We have recently upgraded to ArgoCD
2.12
. We are using ApplicationSets withRollingSync
strategy to deploy our applications. Since the upgrade we have found that the rolling sync gets stuck after performing the first step. This seems to be because the targetVersions in the applicationset status do not get updated with the latest one.We've found the following issue.
The applicationset app status stays in
Pending
state even when the app has been synced and is healthy:This is the application status operation state:
The latest commit is the one in the application status. The one in the applicationset status is older. I think that's the reason why it never continues because the revisions do not match.
One of the errors in the logs I see is the following, it's happening from time to time:
I believe there is a problem there when trying to update the applicationset status.
Sharing a bit more of information of our use case, the applicationset generates the apps from a github repository which has multiple merges a day, this makes that normally there is a merge and a new generation of apps in the middle of a rolling sync. I think there could be an issue there too.
Expected behavior
The rolling sync should continue properly after the first step.