argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.94k stars 5.46k forks source link

auto-sync on monorepo effectively equals to self-heal #10679

Open dbaudracco opened 2 years ago

dbaudracco commented 2 years ago

Checklist:

Describe the bug

When enabling auto-sync any commit to the repo will trigger a sync operation once, however if the application is in a synced state when a new commit is detected no sync is performed for that commit. Once the application is marked as OutOfSync (for example if any change is manually applied to the application resources) the latest commit is immediately applied.

When using a repo with frequent updates (such as a monorepo) this effectively equals to enabling self-healing.

To Reproduce

  1. enable auto-sync
  2. deploy an application targeting a specific branch
  3. push commits to the target branch
  4. update the application resources (e.g. kubectl scale deployment my-server --replicas=1)
  5. changes are immediately overwritten (once)

Expected behavior

I expected ArgoCD to only sync when there are changes to the files used to generate manifests, and not on any commit to the branch.

Version

argocd: v2.4.11+3d9e9f2.dirty
  BuildDate: 2022-08-22T19:33:23Z
  GitCommit: 3d9e9f2f95b7801b90377ecfc4073e5f0f07205b
  GitTreeState: dirty
  GoVersion: go1.19
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.4.11+3d9e9f2.dirty
  BuildDate: 2022-08-22T19:33:23Z
  GitCommit: 3d9e9f2f95b7801b90377ecfc4073e5f0f07205b
  GitTreeState: dirty
  GoVersion: go1.19
  Compiler: gc
  Platform: darwin/amd64
  Kustomize Version: could not get kustomize version: exec: "kustomize": executable file not found in $PATH
  Helm Version: Client: v2.16.7+g5f2584f
  Kubectl Version: v0.23.1
  Jsonnet Version: v0.18.0
dbaudracco commented 2 years ago

The behaviour doesn't change when setting the annotation argocd.argoproj.io/manifest-generate-paths as instructed in the documentation. It appears that the annotation only prevents cache busting but doesn't update the revision of the last operation sync.

crenshaw-dev commented 2 years ago

@dbaudracco on your PR, you proposed modifying this function, but I didn't follow how it would fix the issue. Can you explain?

dbaudracco commented 2 years ago

My reasoning is that when a commit doesn't contain any update relevant for generating manifests, we can consider as if a sync was already performed against said commit. My idea was to compare the generated manifests against the live ones even if app.Status.OperationState.SyncResult.Revision != commitSHA

junjunjunk commented 1 year ago

I think this is a similar issue. https://github.com/argoproj/argo-cd/issues/9594

dbaudracco commented 1 year ago

Yup that looks like the same issue. In my experience it happens regardless of the manifest-generate-paths annotation however.

junjunjunk commented 1 year ago

To recap, this issue is probably caused by as below.

  1. several Applications reference the main branch.
  2. Commit changes to the main branch with changes made to files not referenced by the Application. And then, All application update "CurrentSyncStatus".(perhaps app.Status.SyncStatus .SyncResult )
  3. But there are no differences in manifests referenced by applications. So applications don't perform the sync. Then, "LAST SYNC RESULT"(perhaps app.Status.OperationState.SyncResult ) of the application is not updated.
  4. Patch the live manifests which are monitored by applications. And then, OutOfSync occurs, and immediately sync is triggered according to automated-sync-semantics.

Ref: https://argo-cd.readthedocs.io/en/stable/user-guide/auto_sync/#automated-sync-semantics

I had misunderstood manifest-generate-paths until now, but it seems that manifest-generate-paths only change the handling of the generated cache and do not affect the AutoSync logic because the doc says "webhook will not trigger application reconciliation".

then the webhook will not trigger application reconciliation and the existing cache will be considered valid for the new commit.

https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/#webhook-and-manifest-paths-annotation

junjunjunk commented 1 year ago

To resolve this issue, one of the following patterns would be selected.

Realistically, changing semantics is too big a change in specifications. However, since manifest-generate-paths is an annotation about cache handling, it seems unhealthy to change Sync's behavior depending on it...

lblazewski commented 1 year ago

@crenshaw-dev

I am commenting on this issue to raise the concern and bump this bug. For monorepos which handle multiple environments it's a real pain because it prevents ad hoc changes needed for troubleshooting.

For example in a monorepo setup, you want to have auto sync enabled to get changes upon commits landing in the repo but you disable self heal on development argoCD applications because you often need to change a variable in configmap or adjust the deployment and you don't want ArgoCD to overwrite those changes. When other person commits into the repo that triggers deploy on any other argoCD app, it removes the others person work on the development argoCD applications!

Maybe I am missing something or is there any other way to address this issue? (The only way I know right now is to make sure that you commit all of those small changes to repo but that pollutes git hardly and does not make sense because I don't want self healing and allow messing with the manifests directly in the cluster). As mentioned above the annotation for generating manifests does not help with this.

zswanson commented 1 year ago

Observing this exact issue in 2.8.0, this is kind of a high priority issue for us. We will workaround by reverting to manual sync triggered by CI but we were really hoping to use the webhook to avoid a CI wait on these clusters.

@crenshaw-dev is there any update on brainstoming for a solution here? It seems like a docs update to point this out at least would be a good idea, because it caught us completely unaware and concerned us about how we'd be able to respond to an incident if we needed to make an immediate change

crenshaw-dev commented 1 year ago

I think that auto-sync should probably respect argocd.argoproj.io/manifest-generate-paths. Currently, that annotation only mitigates repo-server load by filtering webhooks. It doesn't prevent unnecessary auto-syncs.

The rough outline would be: 1) git diff --name-only <synced revision> <current revision> 2) Is any of those files in argocd.argoproj.io/manifest-generate-paths? If yes, sync. If not, skip.

I think the repo-server could have a new GetDiffFileList or similar method to support the application-controller when making the "sync/no-sync" decision.

We'd need some kind of flag to opt in to this behavior. Maybe an option nested in syncOptions.automated.

zswanson commented 1 year ago

Any possibility of this making it into 2.9? This completely disrupts our plans to use autosync with webhook. We have self-heal disabled so that emergency changes can be made by our system team before committing them back to git state.

crenshaw-dev commented 1 year ago

@zswanson not really. There's a WIP branch, but I think it'll be too big to cherry-pick into 2.9. https://github.com/argoproj/argo-cd/pull/15636

harrywm commented 1 year ago

I'd like to add another angle to the discussion, less of a massive pain, more of an annoyance - we manage our "Infrastructure" related applications in a mono-repo setup, and reference them using an ApplicationSet. I'm currently implementing Notifications for deployments of these apps, and noticed that due to this behaviour (i believe) we are shipped a notification for every Application that is referenced by the ApplicationSet (same branch, repository), which is very annoying and doesn't represent accurately the actual Syncs made.

This is running with a fairly standard on-deployed trigger:

when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'healthy'

And using the recommended manifest-generate-paths annotation.

Screenshot 2023-11-16 at 18 19 57

Would love to see an improvement merged in here!