argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18.06k stars 5.51k forks source link

One unhealthy application belonging to an applicationset can prevent argocd from syncing the other healthy applications #17646

Open lyesbit opened 8 months ago

lyesbit commented 8 months ago

Checklist:

Describe the bug

We're experiencing some annoying bug/feature of argocd. It can be broken down like this:

We have an applicationset, lets call it customer-network-policies (definition below in the screenshot part)

So basicaly argocd makes use of the generator to loop through a directory called dev-cluster/network-policies and generate one Network Policy per file.

Let's imagine for instance that the so called folder dev-cluster/networkpolicies/ contains 3 network policies files, and, for the sake of the example, let's also imagine that each netpol is created within its own namespace. When everything works fine the Argo CD UI renders something like this:

customer-network-policies(ApplicationSet) ----> customer-network-policies(Application)

                                          ----> network-policy-1 (NetworkPolicy)
                                          |
customer-network-policies(Application) ---|----> network-policy-2 (NetworkPolicy)
                                          |
                      ----> network-policy-3 (NetworkPolicy)

So far so good.

Problem occurs here:

To Reproduce

It very much looks like that one unhealthy network policy which cannot be created for whatever reason (in this case because the namespace does not exist) causes argocd to go bunkers and it cannot even sync/refresh the other network policies belonging to the same application/application set which are absolutely fine.

We've tested this at length and this behaviour is very much reproducible, but it feels wrong.

Expected behavior

One "unhealthy" resource failing to be created/updated/synced should not have any consequences on the other "healthy" resources. ArgoCD should be able to move forward with the healthy one and only display an error message for the unhealthy one.

Screenshots

Definition of the ApplicationSet

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: customer-network-policies
spec:
  generators:
    - git:
        directories:
          - path: dev-cluster/networkpolicies
        repoURL: >-
          ssh://git@<bitbucket>/customer-instances.git
        revision: master
  syncPolicy:
    preserveResourcesOnDeletion: false
  template:
    metadata:
      name: customer-network-polices
    spec:
      destination:
        server: 'https://kubernetes.default.svc'
      project: default
      source:
        path: dev-cluster/networkpolicies
        repoURL: >-
          ssh://git@<bitbucket>/customer-instances.git
        targetRevision: master
      syncPolicy:
        automated:
          allowEmpty: true
          prune: true
          selfHeal: `true`

Error Occuring when including just one net pol that cannot be synced/created tempsnip

ArgoCD logs following error

Failed to load live state: Namespace "336dc7c4-53f6-4a01-9125-b5dd1d37e3c3-p" for NetworkPolicy "allow-from-bit-cbcd-mgmt-monitoring-72a05" is not managed

Version 2.8.4

Paste the output from `argocd version` here.
jgwest commented 8 months ago

One thing to note is that the error message, Namespace "336dc7c4-53f6-4a01-9125-b5dd1d37e3c3-p" (...) is not managed is not necessarily because the Namespace doesn't exist, but rather that the Namespace is not defined in the list of Namespaces that Argo CD has access to on the cluster, via the cluster secret. See the namespaces field of that Secret.

lyesbit commented 8 months ago

Right. However that still does not explain why that one un-syncable resource makes argocd not being able to sync the other healthy/syncable resources that belong to the same application/set, Any explanation ?

lyesbit commented 8 months ago

Any update here ?

andrii-korotkov-verkada commented 3 weeks ago

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and tell us if the issue is still present, please?

lyesbit commented 2 weeks ago

Problem still persists

Argo CD: v2.11.6+089247d Build Date: 2024-07-23T00:41:35Z Go Version: go1.21.11 (Red Hat 1.21.11-1.module+el8.10.0+21986+2112108a) Platform: linux/amd64 jsonnet: v0.20.0 kustomize: v5.2.1 unknown Helm: v3.14.4+g81c902a kubectl: v0.26.11