argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.71k stars 5.4k forks source link

Temporary per-resource sync disable #7975

Open unb9rn opened 2 years ago

unb9rn commented 2 years ago

Summary

Having some annotation like in flux (fluxcd.io/ignore: "true") would be great. It will allow users to tamper with dev resources in real-time as well as integrate ArgoCD with tools like Okteto.

Motivation

I am trying to set up in-cluster app development with Okteto It substitutes the running deployment to some development one while syncing local code to the remote pod. I don't want to disable self-heal feature altogether, but it would be great to make ArgoCD "forget" about any changes for this period of time while the developer works inside the dev container.

Proposal

Implementing some form of annotation should solve this problem

rosscdh commented 2 years ago

+1 for example.

  1. an app creates a ns
  2. dont want that ns to be deleted if the app gets deleted (as other apps may be installed into that ns)
evoosa commented 2 years ago

bumping

crenshaw-dev commented 2 years ago

Is this equivalent to just disabling auto-sync, or would you also like to disable manual syncs?

dont want that ns to be deleted if the app gets deleted (as other apps may be installed into that ns)

@rosscdh this seems like a different request, basically to prevent pruning.

evoosa commented 2 years ago

yes, I am looking to disable auto-sync - but only to specific resources, not for a whole app. We have an app of apps, so I can't control each app individually anyway. I found a way to ignore a type of resource completely in a whole cluster - by adding this to the argocd-cm configmap:

data:
  resource.exclusions: |
    - apiGroups:
      - "apps"
      kinds:
      - Deployment
      clusters:
      - "<CLUSTER_URL>"

BUT what I need is a way to ignore specific resources independent of their type, and only on demand.

crenshaw-dev commented 2 years ago

Thanks for the explanation!

for some reason it's inconsistent

Are there any downsides to argocd.argoproj.io/compare-options: IgnoreExtraneous besides the fact that it seems to be buggy? If not, would you like to detail that issue, and maybe we can address it as a bug rather than an enhancement?

evoosa commented 2 years ago

hey i'd love to add some details regarding this issue, it persists. it happened as follows:

  1. one of our developers used okteto to create a deployment named control. he added the annotation to his okteto yaml.
  2. suddenly, he got the following error in his okteto:
    Follow these steps:
          1. Execute 'okteto down'
          2. Apply your manifest changes again: 'kubectl apply'
          3. Execute 'okteto up' again
    More information is available here: https://okteto.com/docs/reference/known-issues/#kubectl-apply-changes-are-undone-by-okteto-up

it's important to mention that the deployment yaml didn't change in git in any branch, and wasn't synced manually at the time.

  1. I looked in the events of the control app in the argoCD UI, and found the following logs:
ScalingReplicaSet
Scaled down replica set control-598448fcc7 to 0
6
1d ago
Yesterday at 11:48 AM
20m ago
Today at 11:40 AM
ScalingReplicaSet
Scaled up replica set control-9c565658c to 1
7
1d ago
Yesterday at 11:48 AM
20m ago
Today at 11:40 AM
ScalingReplicaSet
Scaled up replica set control-598448fcc7 to 1
6
1d ago
Yesterday at 11:48 AM
33m ago
Today at 11:26 AM
ScalingReplicaSet
Scaled down replica set control-bcf7d485c to 0
9
24d ago
06/23/2022
33m ago
Today at 11:26 AM
ScalingReplicaSet
Scaled down replica set control-9c565658c to 0
6
1d ago
Yesterday at 11:48 AM
34m ago
Today at 11:26 AM
ScalingReplicaSet
Scaled up replica set control-bcf7d485c to 1
9
25d ago
06/22/2022
34m ago
Today at 11:26 AM

AKA - the replicaset was scaled up and down for an unknown reason :\

  1. I looked at the cluster's events in the developer's namespace and found the following lines:
    28m         Normal    Scheduled                pod/control-9c565658c-tlrkp            Successfully assigned camel/control-9c565658c-tlrkp to ip-192-168-24-93.ec2.internal
    28m         Normal    SuccessfulCreate         replicaset/control-9c565658c           Created pod: control-9c565658c-tlrkp
    28m         Normal    Pulled                   pod/control-9c565658c-tlrkp            Container image "okteto/bin:1.3.3" already present on machine
    28m         Normal    Created                  pod/control-9c565658c-tlrkp            Created container okteto-bin
    28m         Normal    Started                  pod/control-9c565658c-tlrkp            Started container okteto-bin
    28m         Normal    Created                  pod/control-9c565658c-tlrkp            Created container okteto-init-volume
    28m         Normal    Started                  pod/control-9c565658c-tlrkp            Started container okteto-init-volume
    28m         Normal    Pulled                   pod/control-9c565658c-tlrkp            Container image "okteto/node:14" already present on machine
    28m         Normal    Pulling                  pod/control-9c565658c-tlrkp            Pulling image "723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest"
    28m         Normal    Pulled                   pod/control-9c565658c-tlrkp            Successfully pulled image "723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest"
    28m         Normal    Created                  pod/control-9c565658c-tlrkp            Created container app
    28m         Normal    Started                  pod/control-9c565658c-tlrkp            Started container app
    28m         Normal    Scheduled                pod/control-bcf7d485c-qm52k            Successfully assigned camel/control-bcf7d485c-qm52k to ip-192-168-111-132.ec2.internal
    28m         Normal    SuccessfulCreate         replicaset/control-bcf7d485c           Created pod: control-bcf7d485c-qm52k
    28m         Normal    Pulling                  pod/control-bcf7d485c-qm52k            Pulling image "723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest"
    28m         Normal    Pulled                   pod/control-bcf7d485c-qm52k            Successfully pulled image "723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest"
    28m         Normal    Created                  pod/control-bcf7d485c-qm52k            Created container app
    28m         Normal    Started                  pod/control-bcf7d485c-qm52k            Started container app
    28m         Normal    Killing                  pod/control-9c565658c-tlrkp            Stopping container app
    28m         Normal    SuccessfulDelete         replicaset/control-9c565658c           Deleted pod: control-9c565658c-tlrkp
    27m         Normal    Killing                  pod/control-bcf7d485c-qm52k            Stopping container app
    27m         Normal    SuccessfulDelete         replicaset/control-bcf7d485c           Deleted pod: control-bcf7d485c-qm52k
    27m         Normal    Scheduled                pod/control-598448fcc7-k9tsg           Successfully assigned camel/control-598448fcc7-k9tsg to ip-192-168-24-93.ec2.internal
    27m         Normal    SuccessfulCreate         replicaset/control-598448fcc7          Created pod: control-598448fcc7-k9tsg
    27m         Normal    SuccessfulAttachVolume   pod/control-598448fcc7-k9tsg           AttachVolume.Attach succeeded for volume "pvc-0a3ef695-92b3-49a3-bdb9-705e9d71c0ef"
    27m         Normal    Created                  pod/control-598448fcc7-k9tsg           Created container okteto-bin
    27m         Normal    Pulled                   pod/control-598448fcc7-k9tsg           Container image "okteto/bin:1.3.3" already present on machine
    27m         Normal    Started                  pod/control-598448fcc7-k9tsg           Started container okteto-bin
    27m         Normal    Started                  pod/control-598448fcc7-k9tsg           Started container okteto-init-volume
    27m         Normal    Created                  pod/control-598448fcc7-k9tsg           Created container okteto-init-volume
    27m         Normal    Pulled                   pod/control-598448fcc7-k9tsg           Container image "okteto/node:14" already present on machine
    27m         Normal    Created                  pod/control-598448fcc7-k9tsg           Created container app
    27m         Normal    Pulled                   pod/control-598448fcc7-k9tsg           Successfully pulled image "okteto/node:14"
    27m         Normal    Started                  pod/control-598448fcc7-k9tsg           Started container app
    27m         Normal    Pulling                  pod/control-598448fcc7-k9tsg           Pulling image "okteto/node:14"
    14m         Warning   FailedAttachVolume       pod/control-9c565658c-5wxmp            Multi-Attach error for volume "pvc-0a3ef695-92b3-49a3-bdb9-705e9d71c0ef" Volume is already exclusively attached to one node and can't be attached to another
    14m         Normal    SuccessfulCreate         replicaset/control-9c565658c           Created pod: control-9c565658c-5wxmp
    14m         Normal    Killing                  pod/control-598448fcc7-k9tsg           Stopping container app
    14m         Normal    SuccessfulDelete         replicaset/control-598448fcc7          Deleted pod: control-598448fcc7-k9tsg
    14m         Normal    Scheduled                pod/control-9c565658c-5wxmp            Successfully assigned camel/control-9c565658c-5wxmp to ip-192-168-8-128.ec2.internal
    13m         Normal    SuccessfulAttachVolume   pod/control-9c565658c-5wxmp            AttachVolume.Attach succeeded for volume "pvc-0a3ef695-92b3-49a3-bdb9-705e9d71c0ef"
    13m         Normal    Started                  pod/control-9c565658c-5wxmp            Started container okteto-bin
    13m         Normal    Created                  pod/control-9c565658c-5wxmp            Created container okteto-bin
    13m         Normal    Pulled                   pod/control-9c565658c-5wxmp            Container image "okteto/bin:1.3.3" already present on machine
    13m         Normal    Pulled                   pod/control-9c565658c-5wxmp            Container image "okteto/node:14" already present on machine
    13m         Normal    Created                  pod/control-9c565658c-5wxmp            Created container okteto-init-volume
    13m         Normal    Started                  pod/control-9c565658c-5wxmp            Started container okteto-init-volume
    13m         Normal    Pulling                  pod/control-9c565658c-5wxmp            Pulling image "723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest"
    13m         Normal    Pulled                   pod/control-9c565658c-5wxmp            Successfully pulled image "723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest"
    13m         Normal    Started                  pod/control-9c565658c-5wxmp            Started container app
    13m         Normal    Created                  pod/control-9c565658c-5wxmp            Created container app

    this is the deployment's active manifest AFTER okteto crashed:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    annotations:
    deployment.kubernetes.io/revision: "49"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"control","namespace":"fox"},"spec":{"selector":{"matchLabels":{"name":"control"}},"template":{"metadata":{"labels":{"name":"control"}},"spec":{"containers":[{"envFrom":[{"configMapRef":{"name":"control"}},{"configMapRef":{"name":"general"}},{"secretRef":{"name":"control"}},{"secretRef":{"name":"jwt"}}],"image":"723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest","imagePullPolicy":"Always","name":"app","ports":[{"containerPort":3000}]}]}}}}
    creationTimestamp: "2021-08-01T10:41:23Z"
    generation: 148
    name: control
    namespace: fox
    resourceVersion: "130144817"
    selfLink: /apis/apps/v1/namespaces/fox/deployments/control
    uid: 1b90a4f4-fa95-44f5-9e6d-9e8600e8cd84
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
      name: control
    strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
    template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2021-08-19T10:18:35+03:00"
      creationTimestamp: null
      labels:
        name: control
    spec:
      containers:
      - envFrom:
        - configMapRef:
            name: control
        - configMapRef:
            name: general
        - secretRef:
            name: control
        - secretRef:
            name: jwt
        image: 723128751635.dkr.ecr.us-east-1.amazonaws.com/control:latest
        imagePullPolicy: Always
        name: app
        ports:
        - containerPort: 3000
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
    status:
    availableReplicas: 1
    conditions:
    - lastTransitionTime: "2021-08-01T10:41:23Z"
    lastUpdateTime: "2022-02-16T17:18:45Z"
    message: ReplicaSet "control-796fc564b5" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
    - lastTransitionTime: "2022-03-13T20:35:34Z"
    lastUpdateTime: "2022-03-13T20:35:34Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
    observedGeneration: 148
    readyReplicas: 1
    replicas: 1
    updatedReplicas: 1

any idea why would the replicaset scale up and down? we'd really appreciate your help, and love to provide more information if necessary!

vl-kp commented 1 year ago

is there any annotations that disabling auto-sync?

thesuperzapper commented 4 months ago

An explicit argocd.argoproj.io/sync-options: Ignore=true (similar to the Prune=false) would be great, as it would let people annotate a resource that needs to be manually changed (e.g. during an emergency or test), and have argocd not keep updating it.

thesuperzapper commented 4 months ago

@crenshaw-dev Just so we are clear, using argocd.argoproj.io/compare-options: IgnoreExtraneous does not stop ArgoCD from updating the resource, it only stops ArgoCD from saying an application is "out of sync" if that resource exists in the cluster, but not in the application source.

That is to say, for the use case of disabling syncing for a specific resource, argocd.argoproj.io/compare-options: IgnoreExtraneous literally does nothing, because the resource will exist in the source, and so be updated with every sync.

vhurtevent commented 3 months ago

Hello, I'm also interested by this feature :

Our use case : temporary disable argoCD sync on ressources during a maintenance window, could be :