Open stephaneetje opened 2 days ago
@stephaneetje my apologies, as I seem to have misunderstood the nature of your loop. With the previous discussion having taken place in a thread about desiredCommit
, I was misled into believing that to have been an actual factor in your problem. If you leave that field alone, as you are currently, AND if there is no commit in your Freight, then when syncing an App, Kargo has no notion of what specific revision that App should observably be synced to once complete.
This means the cause of your loop has nothing to do with a mismatch of the desired revision.
The continuous syncing is probably due to the argocd-update step syncing, then when checking to see if the sync completed, discovering that the most recently completely sync wasn't the one it asked for. It's waiting for its request to complete and seeing a sync completed, but that operation must have been initiated by someone or something else, so it tries again.
Do you happen to have auto-sync enabled on your Apps? It is possible that is what the update step is fighting a losing battle against. We do generally suggest disabling autosync oh Apps when you use Kargo. I will address it in the docs to make that more clear.
If this ends up not being the issue, enabling debug logging on the controller should provide more insight into what conditions are causing the update step to be caught in this sync loop.
I did deactivate autosync. I already enabled DEBUG logs, but it didn't help:
time="2024-11-20T11:44:58Z" level=debug msg="enqueued Promotion for reconciliation" app=scim-staging namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a
time="2024-11-20T11:44:58Z" level=debug msg="reconciling Promotion" namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a
time="2024-11-20T11:44:58Z" level=debug msg="continuing Promotion" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:44:58Z" level=debug msg="executing argocd-update promotion step" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:44:58Z" level=debug msg="patched Argo CD Application" app=scim-staging freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:44:58Z" level=debug msg="done executing argocd-update promotion step" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:44:58Z" level=debug msg=promotion freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim phase="\"Running\"" promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging targetFreight=176d90ae395fe87b5d0117b96ce24cf135601a11
time="2024-11-20T11:45:35Z" level=debug msg="enqueued Promotion for reconciliation" app=scim-staging namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a
time="2024-11-20T11:45:35Z" level=debug msg="reconciling Promotion" namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a
time="2024-11-20T11:45:35Z" level=debug msg="continuing Promotion" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:35Z" level=debug msg="executing argocd-update promotion step" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:35Z" level=debug msg="patched Argo CD Application" app=scim-staging freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:35Z" level=debug msg="done executing argocd-update promotion step" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:35Z" level=debug msg=promotion freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim phase="\"Running\"" promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging targetFreight=176d90ae395fe87b5d0117b96ce24cf135601a11
time="2024-11-20T11:45:50Z" level=debug msg="enqueued Promotion for reconciliation" app=scim-staging namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a
time="2024-11-20T11:45:50Z" level=debug msg="reconciling Promotion" namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a
time="2024-11-20T11:45:50Z" level=debug msg="continuing Promotion" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:50Z" level=debug msg="executing argocd-update promotion step" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:50Z" level=debug msg="patched Argo CD Application" app=scim-staging freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:50Z" level=debug msg="done executing argocd-update promotion step" freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging
time="2024-11-20T11:45:50Z" level=debug msg=promotion freight=176d90ae395fe87b5d0117b96ce24cf135601a11 namespace=scim phase="\"Running\"" promotion=staging.01jd4nrbdn19tdabv0t3the2e6.176d90a stage=staging targetFreight=176d90ae395fe87b5d0117b96ce24cf135601a11
EDIT: I verified and sync is comming from Kargo (INITIATED BY: automated sync policy)
I verified and sync is comming from Kargo
Oh. I had no doubt of that.
The mystery is why Kargo is syncing over and over again -- which would usually be because:
I'm trying to piece together what's going wrong at step 3 and it was conjecture that something besides Kargo might (additionally) be initiating sync operations.
I am attempting to replicate this.
Running a build with #2936 in it should make it much easier to trace the reason without having to try to reproduce it.
Also, I have tried quite hard to replicate this.
I have 13 Projects x 3 Stages per project, each with an associated Argo CD App.
All 39 Stages are reading from and writing to just one single branch.
Here's a gist of what one Project looks like:
https://gist.github.com/krancour/dafa7ded1b4db4c0683c05e7f01129e8
There are 12 more just like it.
I've been unable to trigger any kind of infinite sync loop.
I'll note the problem I did run into:
With lots and lots of Promos running concurrently, all racing to push to the same remote branch, unsurprisingly, some of those failed because in the space between a Promotion checking out the branch and pushing to it, some other Promotion pushed its own commit. Re-trying any Promotion that failed for these reasons put things back on track. This will be a problem for anyone who has many Stages making updates to a single branch and a large volume of Promotions. (Add this to the list of problems that are solved for free by embracing the notion of branches as env-specific config storage.)
Not to get too sidetracked here, but there's a limited amount we can do about the above because Promotion processes are user-defined and the actions carried out by each step are opaque to the rest of Kargo. This means there's no possibility of "try again starting from step n," but I will open a separate issue with a proposal for a "FailurePolicy" that could have, among other options, the option to start over from step 0.
Circling back to the subject at hand, as I said, I couldn't reproduce that issue, so it would be interesting @stephaneetje if you are able to identify what you and I have done differently. I attempted to replicate your configuration as best I could, but there there's clearly some consequential difference.
Running a build with #2936 in it should make it much easier to trace the reason without having to try to reproduce it.
ghcr.io/hiddeco/kargo:1.0.4-rc.66550612
built from our release-1.0
branch should log the reason for the infinite loop, if any.
I have the same problem and this is what I found in the logs:
time="2024-11-20T18:21:15Z" level=debug msg="sync result revisions [0b8559a96d3441da09b2c84f6bd85f8fa06e56d1] do not match desired revisions [62429f4247599ff1746023c29e4514e24dee322e]" freight=d82c676db847dce25b54c3328801ec10fae9592b namespace=kargo-poc-tags promotion=teststable-tags.01jd5cw76w9waj367f69rsk1ek.d82c676 stage=teststable-tags
While 0b8559
being the head of the branch (with new manifests in a separate folder) and 62429f
being the commit the freight originated from.
Is it by any chance related to using the same source branch to merge the results back to?
@krancour i just tried with only the name :
- uses: argocd-update
config:
apps:
- name: scim-staging
Step passes well, but i end up with the application being synced successfully, then outofsync (image has not been updated). I checked the revision used by the sync, it is the good one with the image change.
I have to do another manual sync
Also kargo documentation sais that apps[].sources[].repoURL
is mandatory, thats why i had it before, and because i thought that would make sure the synced commit is the one from precedent step.
Step passes well, but i end up with the application being synced successfully, then outofsync (image has not been updated). I checked the revision used by the sync, it is the good one with the image change.
This has the appearance of an Argo CD quirk/issue and not a Kargo issue.
Also kargo documentation sais that apps[].sources[].repoURL is mandatory, thats why i had it before, and because i thought that would make sure the synced commit is the one from precedent step.
Ahhhhh. This explains everything. 😄 I had not noticed that source update.
sources
isn't mandatory. repoURL
is mandatory on each source, if you specify any, that is. That is for updating individual sources of the Application -- which is something you have no need to do. All you want to do is trigger a sync.
When you do perform any kind of updates on a source, there's a whole additional layer of stuff that happens when checking if the sync was successful. It compares what the sources look like to what you want them to look like. The discrepancy was here (somehow?) and because of it, the sync was being re-attempted over and over again because it kept looking like it hadn't worked.
Step passes well, but i end up with the application being synced successfully, then outofsync (image has not been updated). I checked the revision used by the sync, it is the good one with the image change.
This has the appearance of an Argo CD quirk/issue and not a Kargo issue.
Also kargo documentation sais that apps[].sources[].repoURL is mandatory, thats why i had it before, and because i thought that would make sure the synced commit is the one from precedent step.
Ahhhhh. This explains everything. 😄 I had not noticed that source update.
sources
isn't mandatory.repoURL
is mandatory on each source, if you specify any, that is. That is for updating individual sources of the Application -- which is something you have no need to do. All you want to do is trigger a sync.When you do perform any kind of updates on a source, there's a whole additional layer of stuff that happens when checking if the sync was successful. It compares what the sources look like to what you want them to look like. The discrepancy was here (somehow?) and because of it, the sync was being re-attempted over and over again because it kept looking like it hadn't worked.
I opened an Argocd issue, but i feel like kargo is syncing before refreshing when i look at the application avents
Does Kargo wait for refresh to finish before sending a sync signal?
I opened an Argocd issue, but i feel like kargo is syncing before refreshing when i look at the application avents
Does Kargo wait for refresh to finish before sending a sync signal?
We patch the Application with a sync operation. When Argo CD syncs, it is smart about not relying on its cache. So the short answer is no. The longer answer is, "no because there's no need."
After the sync is initiated, Kargo follows up on it as many times as necessary to find evidence that the exact sync operation it initiated has since completed.
Seems like app starts showing out of sync only when sync is already running, indicating there can be some race. Idk about exact reasons though.
Hello,
I've been trying to use kargo to only promote image tags (using helm-image-update) but on all my stages, i get an inifnite loop at the argocd-update step.
Here is my stage:
I also tried removing
desiredCommitFromStep
but didn't change anything. My warehouse is only subscribing an image repository:You can see in the Argocd app's envents that it is looping
Discussion has began on this issue, opening another one as asked by @krancour