argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.53k stars 5.34k forks source link

Rollback aware auto-sync #9570

Open stereobutter opened 2 years ago

stereobutter commented 2 years ago

Summary

Currently auto-sync and rollback do not play nicely together. To perform a rollback auto-sync has to be disabled first and if reenabled happily will sync again to the commit that was rolled back previously. If argo remembered the commit that got rolled back and excluded it from auto-sync this would improve that situation a lot.

This might be the same or a similar issue as https://github.com/argoproj/argo-cd/issues/5351 but I found the issue description really unclear.

Motivation

What has to happen when a rollback is needed currently entails multiple steps to get an application back on track:

  1. disable auto-sync
  2. perform rollback
  3. revert/fix the offending commit in git
  4. reenable auto-sync

If auto-sync excluded the offending commit this would be reduced to

  1. perform rollback
  2. revert/fix the offending commit in git

Apart from saving a developer a bit of manual effort, where this feature would be really useful is for automating rollbacks. We currently tinker with using a post-sync hook for running some tests against our application and would like to automatically rollback using a fail hook (via the argo api [^1]). When a developer then releases a fix, the application would automatically sync again and everything would be back to normal.

[^1]: it appears this would currently not work without some level of indirection due to https://github.com/argoproj/argo-cd/issues/6494. Maybe one could use a argo-workflow that starts another workflow to do the rollback?

Proposal

Argo would grow a new option that when enabled would cause auto-sync to ignore a commit if it has been rolled back. For the basic use case it would probably suffice to just remember the last commit that was rolled back (compared to keeping a history of commits).

alexef commented 1 year ago

linking https://github.com/argoproj/argo-cd/issues/8686 here as a way to connect the dots of all these requests

jsirianni commented 1 year ago

This behavior surprised me. Right now, rolling back feels unsafe because Argo will happily re-deploy shortly after.

afernandez-01 commented 1 year ago

+1

mjudeikis commented 1 year ago

Im really interested how people solve this problem. In example in our stack this is how we currently do:

  1. We have staging, production git repos with argo manifest
  2. We have 2 environments, both managed by single argocd instance
  3. Each service has 2 application in argo (app-staging, app-prod)
  4. staging is auto-sync enabled always and sync from master where Prod is NOT set to auto-sync
  5. each app has "post hook" to run sync to prod. And this is where magic starts:
  6. Post hook runs mini golang cli application which does following: 6.1. Get production argo application manifest and checks it it was rollbacked:
    ```go
    // status.sync.status - tells if application is Sync/OutOfSync
    // status.sync.revision - desired state pulled from source (in our case git) automatically by argo
    // status.operationState.operation.sync.revision - current deployed revision
    // status.history - deployment history

    Pseudo code would look like this:

    
    ```go
    currentRev := resp.Status.OperationState.Operation.Sync.Revision // Current revision based on operation.
    desiredRev := resp.Status.Sync.Revision
    status := resp.Status.Sync.Status

if status == ApplicationSyncStatusSynced && currentRev == newRevision { return backoff.Permanent(fmt.Errorf("application %s is already synced to revision %s", appName, newRevision)) }

var latest int if status == ApplicationSyncStatusOutOfSync { for _, item := range resp.Status.History { if item.ID > latest { latest = item.ID } }

for _, item := range resp.Status.History {
    if currentRev == item.Revision &&
        desiredRev != currentRev &&
        item.ID != latest {
            return backoff.Permanent(fmt.Errorf("application %s rolled back to revision %s, you should sync it manually", appName, currentRev))
            }
        }
    }

    if status != ApplicationSyncStatusOutOfSync {
        return fmt.Errorf("waiting for app %s to appear out-of-sync, current_revision=%s current_status=%s", appName, currentRev, status)
    }
    return nil
})


This way we "try" (I say try as at this point its best we were able to do, so suggestion are welcome" ) to be more deterministic and not to sync over rollbacked production env. 
selcuktemizsoy commented 12 months ago

Is there any update on this topic? We have some argo workflows where we need this improvement. If someone solved it already, I will be really appreciated.

Our workflow: Run argo cd sync Run some tests after argo cd sync as post sync hook After this step we want to have auto rollback where argo cd do this automatically if post sync fails.

simply: Deploy-Post-sync if fails rollback if pass do nothing.

Kampe commented 11 months ago

Would also like to know why these two tools do not play well together on rollback..... when you need them to.

chinkung commented 6 months ago

Rollback button should be disabled (History should be view only mode) if auto-sync is enabled

AndresPinerosZen commented 5 months ago

@chinkung With the current behavior, yes. Having the rollback button enabled can confusing. But if Git reconciliation is implemented for the rollback, then it should be enabled all the time. They are asking to implement the reconciliation in Git automatically, which makes sense tbh. Just because it is GitOps doesn't mean that there can't be nice UI features to change the source of truth.

jluque0101 commented 5 months ago

This issue affects us too. Users have to go through multiple steps, including creating PRs to disable auto-sync, just to perform a successful rollback without overwriting changes.

StyleTang commented 4 months ago

Hi Team,

Can we provide a mechanism where certain parameters are not controlled by Yaml auto-sync and are only controlled by the UI?

For example, we can add an UI control only parameter syncPolicy.uiControl.blockAutomatedSync (default false).

Assuming that our service is enabled with syncPolicy, the final configuration would be:

syncPolicy:
    uiControl:
      blockAutomatedSync: false
    automated:
      prune: true
      selfHeal: true

The user story would be as follows:

Is this feasible, or does anyone have concerns or better suggestions?

Thanks!

billyatroadie commented 2 weeks ago

Here for the notifications