argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.59k stars 5.36k forks source link

ApplicationSet suddenly deletes applications #18780

Closed audrey-mux closed 3 months ago

audrey-mux commented 3 months ago

Checklist:

Describe the bug

We had a sudden deletion of a handful of applications created by appsets. The applicationset controller looks like it lost its connection to the kube-api service for less than a second. This caused errors in the application generation. The connection issue resolved quickly, but within a few seconds of the event the affected applications were deleted by the applicationset controller.

They were recreated a few seconds later, but the damage was done.

Since set the application controller policy to create-update and adding

  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  syncPolicy:
    preserveResourcesOnDeletion: true

To all applicationset manifests. Will that be enough to prevent deletion if this sort of error were to happen again?

To Reproduce

Break the applicationset controllers access to the local kube-api service.

Expected behavior

Expected at a minimum a retry, not application deletion.

Screenshots

Version

❯ argocd version
argocd: v2.11.3+3f344d5
  BuildDate: 2024-06-06T12:33:08Z
  GitCommit: 3f344d54a4e0bbbb4313e1c19cfe1e544b162598
  GitTreeState: clean
  GoVersion: go1.22.4
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.11.2+25f7504
  BuildDate: 2024-05-23T13:32:13Z
  GitCommit: 25f7504ecc198e7d7fdc055fdb83ae50eee5edd0
  GitTreeState: clean
  GoVersion: go1.21.9
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.2.1 2023-10-19T20:13:51Z
  Helm Version: v3.14.4+g81c902a
  Kubectl Version: v0.26.11
  Jsonnet Version: v0.20.0

Logs

The API connection errors

2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/gcp-pd-csi-driver error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[csi_driver:enabled environment:staging provider:gcp] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error occurred during application validation: Get \"https://172.16.192.1:443/apis/argoproj.io/v1alpha1/namespaces/argocd/appprojects/aws-eu-central-1-dop1\": http2: client connection lost" applicationset=argocd/flink-kubernetes-operator
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/access error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/autoscaler error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/coredns error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/storage error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/gce-addons error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/rbac error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.089 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/vault-csi-controller error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.088 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.088 W0621 23:30:14.088600       7 reflector.go:347] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-06-21 16:30:14.088 W0621 23:30:14.088583       7 reflector.go:347] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-06-21 16:30:14.088 W0621 23:30:14.088563       7 reflector.go:347] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-06-21 16:30:14.088 W0621 23:30:14.088528       7 reflector.go:347] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: watch of *v1alpha1.ApplicationSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-06-21 16:30:14.088 time="2024-06-21T23:30:14Z" level=error msg="error generating application from params" applicationset=argocd/tracing error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="{nil &ClusterGenerator{Selector:{map[environment:staging] []},Template:ApplicationSetTemplate{ApplicationSetTemplateMeta:ApplicationSetTemplateMeta{Name:,Namespace:,Labels:map[string]string{},Annotations:map[string]string{},Finalizers:[],},Spec:ApplicationSpec{Source:nil,Destination:ApplicationDestination{Server:,Namespace:,Name:,},Project:,SyncPolicy:nil,IgnoreDifferences:[]ResourceIgnoreDifferences{},Info:[]Info{},RevisionHistoryLimit:nil,Sources:[]ApplicationSource{},},},Values:map[string]string{environment: staging,},} nil nil nil nil nil nil nil nil}"
2024-06-21 16:30:14.088 time="2024-06-21T23:30:14Z" level=error msg="error generating params" error="error listing clusters: Get \"https://172.16.192.1:443/api/v1/namespaces/argocd/secrets?labelSelector=argocd.argoproj.io%2Fsecret-type%3Dcluster\": http2: client connection lost" generator="&{0xc0014ea060 {{}} 0xc00109a4e0 argocd 0xc000fa8240}"
2024-06-21 16:30:14.088 W0621 23:30:14.088376       7 reflector.go:347] pkg/mod/k8s.io/client-go@v0.26.11/tools/cache/reflector.go:169: watch of *v1alpha1.Application ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

and the deletion (app names partially redacted)

2024-06-21 16:30:19.565 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/eastus-vos1-rbac applicationset=argocd/rbac
2024-06-21 16:30:19.515 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-west1-vos1-coredns applicationset=argocd/coredns
2024-06-21 16:30:19.461 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-east-1-dos1-vault-csi-controller applicationset=argocd/vault-csi-controller
2024-06-21 16:30:19.411 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-west1-vos1-access applicationset=argocd/access
2024-06-21 16:30:19.356 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/eastus-vos1-gce-addons applicationset=argocd/gce-addons
2024-06-21 16:30:19.309 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-west4-vos1-storage applicationset=argocd/storage
2024-06-21 16:30:19.259 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-west4-vos1-tracing applicationset=argocd/tracing
2024-06-21 16:30:19.166 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/eastus-vos1-autoscaler applicationset=argocd/autoscaler
2024-06-21 16:30:19.076 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-east-1-tos1-rbac applicationset=argocd/rbac
2024-06-21 16:30:19.008 time="2024-06-21T23:30:19Z" level=info msg="Deleted application" app=argocd/us-east-1-dos1-coredns applicationset=argocd/coredns
2024-06-21 16:30:18.961 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west4-vos1-vault-csi-controller applicationset=argocd/vault-csi-controller
2024-06-21 16:30:18.910 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/eastus-vos1-access applicationset=argocd/access
2024-06-21 16:30:18.861 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west2-ves3-gce-addons applicationset=argocd/gce-addons
2024-06-21 16:30:18.808 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west2-ves3-storage applicationset=argocd/storage
2024-06-21 16:30:18.758 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west2-ves3-tracing applicationset=argocd/tracing
2024-06-21 16:30:18.661 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-east-1-tos1-autoscaler applicationset=argocd/autoscaler
2024-06-21 16:30:18.562 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west4-vos1-rbac applicationset=argocd/rbac
2024-06-21 16:30:18.509 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west2-ves3-coredns applicationset=argocd/coredns
2024-06-21 16:30:18.465 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west1-vos1-vault-csi-controller applicationset=argocd/vault-csi-controller
2024-06-21 16:30:18.410 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-east-1-tos1-access applicationset=argocd/access
2024-06-21 16:30:18.359 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-east-1-dos1-gce-addons applicationset=argocd/gce-addons
2024-06-21 16:30:18.308 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west1-vos1-storage applicationset=argocd/storage
2024-06-21 16:30:18.262 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/eastus-vos1-tracing applicationset=argocd/tracing
2024-06-21 16:30:18.161 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-west4-vos1-autoscaler applicationset=argocd/autoscaler
2024-06-21 16:30:18.066 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-east-1-dos1-rbac applicationset=argocd/rbac
2024-06-21 16:30:18.012 time="2024-06-21T23:30:18Z" level=info msg="Deleted application" app=argocd/us-east-1-tos1-coredns applicationset=argocd/coredns
2024-06-21 16:30:17.962 time="2024-06-21T23:30:17Z" level=info msg="Deleted application" app=argocd/eastus-vos1-vault-csi-controller applicationset=argocd/vault-csi-controller
2024-06-21 16:30:17.911 time="2024-06-21T23:30:17Z" level=info msg="Deleted application" app=argocd/us-west2-ves3-access applicationset=argocd/access
2024-06-21 16:30:17.857 time="2024-06-21T23:30:17Z" level=info msg="Deleted application" app=argocd/us-east-1-tos1-gce-addons applicationset=argocd/gce-addons
2024-06-21 16:30:17.811 time="2024-06-21T23:30:17Z" level=info msg="Deleted application" app=argocd/-us-east-1-dos1-storage applicationset=argocd/storage
crenshaw-dev commented 3 months ago

I think this is probably another example of this bug: https://github.com/argoproj/argo-cd/issues/18212

I think the comment by @todaywasawesome here was prescient: a generator failure might mean that it's time to stop the world.

I'm going to revert #17062 until the author has time to make it safer.

audrey-mux commented 3 months ago

Ah, yep that’s likely it. It’s happening even with cluster generators

audrey-mux commented 3 months ago

Hey @crenshaw-dev

Was wondering if there's an ETA on #18781 getting merged and a new release cut?

crenshaw-dev commented 3 months ago

@audrey-mux I'll cherry-pick the change to 2.12 and 2.12 and plan to cut a release today or tomorrow.