crossplane-contrib / provider-argocd

Crossplane provider to provision and manage Argo CD objects
Apache License 2.0
68 stars 35 forks source link

Fail to Recreate Deleted Cluster in ArgoCD v2.9.1 #116

Closed jefflantz closed 9 months ago

jefflantz commented 9 months ago

What happened?

First off, I have seen that the provider is configured to use ArgoCD v2.8.4 However, there was some unexpected behavior with 2.9.1 that will need to be addressed eventually, so I'm bringing up this issue now.

The behavior I saw is if a user deletes a Cluster in the ArgoCD server that was managed by provider-argocd, the Cluster fails to be recreated, and stays in the state synced: false. Specifically, the error shown is

Warning  CannotObserveExternalResource  2m12s (x376 over 3d7h)  managed/cluster  cannot get Argocd Cluster: rpc error: code = PermissionDenied desc = permission denied

In the controller logs, I see

2023-11-27T16:25:08Z    DEBUG   provider-argocd Cannot observe external resource        {... "error": "cannot get Argocd Cluster: rpc error: code = PermissionDenied desc = permission denied", "errorVerbose": "rpc error: code = PermissionDenied desc = permission denied\ncannot get Argocd Cluster\*external).Observe\n\\*Reconciler).Reconcile\n\\*Controller).Reconcile\n\\*Controller).reconcileHandler\n\\*Controller).processNextWorkItem\n\\*Controller).Start.func2.2\n\\nruntime.goexit\n\truntime/asm_amd64.s:1594"}

Instead of throwing an error, I would expect the Cluster to be recreated.

How can we reproduce it?

Not sure exactly what the breaking version is just looking at release notes. We went 2.7.7 -> 2.9.1, so you can try with ArgoCD 2.9.1. Also note that we're using the latest build of the provider; that came following this PR, which could be related.

To go through our setup, first we added a user in our ArgoCD RBAC with the following permissions:

policy.csv: |
      p, <user>, clusters, *, *, allow

Then ran the following commands:

ARGOCD_ADMIN_SECRET=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
ARGOCD_ADMIN_TOKEN=$(curl -s -X POST -k -H "Content-Type: application/json" --data '{"username":"admin","password":"'$ARGOCD_ADMIN_SECRET'"}' ${ARGOCD_SERVER}/api/v1/session | jq -r .token)
curl -s -X POST -k -H "Authorization: Bearer $ARGOCD_ADMIN_TOKEN" -H "Content-Type: application/json" ${ARGOCD_SERVER}/api/v1/account/<user>/token | jq -r .token

We then used this token to create a secret and referenced it in a provider config called argocd.

Now using some test cluster, referenced below as 'test-cluster', add the following Cluster

Create a Cluster managed resource:

kind: Cluster
  name: test-cluster
    server: <server>
        key: token
        name: test-cluster-token
        namespace: <namespace>
    name: test-cluster
    name: argocd

Now manually delete the Cluster from the ArgoCD API, and it doesn't come back up.

As added context, I tried running something like argocd cluster get doesntexist --server <server> with both the user's token and admin creds, and got the error FATA[0000] rpc error: code = PermissionDenied desc = permission denied. I expected the error "Cluster Not Found"

What environment did it happen in?

Crossplane version:v1.13.2-up.2 Crossplane Provider argocd version: Kubernetes client 1.25.3, Server 1.26.10, on an Upbound Managed Control Plane; ArgoCD running in EKS, running helm release 5.51.2.