crossplane-contrib / provider-argocd

Crossplane provider to provision and manage Argo CD objects
Apache License 2.0
68 stars 35 forks source link

Fail to Recreate Deleted Cluster in ArgoCD v2.9.1 #116

Closed jefflantz closed 9 months ago

jefflantz commented 9 months ago

What happened?

First off, I have seen that the provider is configured to use ArgoCD v2.8.4 https://github.com/crossplane-contrib/provider-argocd/blob/00b9ad6eb4f625d478351d2bc7d96d588e0ea7c7/go.mod#L6 However, there was some unexpected behavior with 2.9.1 that will need to be addressed eventually, so I'm bringing up this issue now.

The behavior I saw is if a user deletes a Cluster in the ArgoCD server that was managed by provider-argocd, the Cluster fails to be recreated, and stays in the state synced: false. Specifically, the error shown is

Warning  CannotObserveExternalResource  2m12s (x376 over 3d7h)  managed/cluster  cannot get Argocd Cluster: rpc error: code = PermissionDenied desc = permission denied

In the controller logs, I see

2023-11-27T16:25:08Z    DEBUG   provider-argocd Cannot observe external resource        {... "error": "cannot get Argocd Cluster: rpc error: code = PermissionDenied desc = permission denied", "errorVerbose": "rpc error: code = PermissionDenied desc = permission denied\ncannot get Argocd Cluster\ngithub.com/crossplane-contrib/provider-argocd/pkg/controller/cluster.(*external).Observe\n\tgithub.com/crossplane-contrib/provider-argocd/pkg/controller/cluster/controller.go:120\ngithub.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile\n\tgithub.com/crossplane/crossplane-runtime@v0.19.2/pkg/reconciler/managed/reconciler.go:780\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235\nruntime.goexit\n\truntime/asm_amd64.s:1594"}

Instead of throwing an error, I would expect the Cluster to be recreated.

How can we reproduce it?

Not sure exactly what the breaking version is just looking at release notes. We went 2.7.7 -> 2.9.1, so you can try with ArgoCD 2.9.1. Also note that we're using the latest build of the provider; xpkg.upbound.io/crossplane-contrib/provider-argocd:v0.5.0-rc.0.2.gf446591 that came following this PR https://github.com/crossplane-contrib/provider-argocd/pull/101/files, which could be related.

To go through our setup, first we added a user in our ArgoCD RBAC with the following permissions:

policy.csv: |
      p, <user>, clusters, *, *, allow

Then ran the following commands:

ARGOCD_ADMIN_SECRET=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
ARGOCD_SERVER=<ArgoCD server>
ARGOCD_ADMIN_TOKEN=$(curl -s -X POST -k -H "Content-Type: application/json" --data '{"username":"admin","password":"'$ARGOCD_ADMIN_SECRET'"}' ${ARGOCD_SERVER}/api/v1/session | jq -r .token)
curl -s -X POST -k -H "Authorization: Bearer $ARGOCD_ADMIN_TOKEN" -H "Content-Type: application/json" ${ARGOCD_SERVER}/api/v1/account/<user>/token | jq -r .token

We then used this token to create a secret and referenced it in a provider config called argocd.

Now using some test cluster, referenced below as 'test-cluster', add the following Cluster

Create a Cluster managed resource:

apiVersion: cluster.argocd.crossplane.io/v1alpha1
kind: Cluster
metadata:
  name: test-cluster
spec:
  forProvider:
    server: <server>
    config:
      bearerTokenSecretRef:
        key: token
        name: test-cluster-token
        namespace: <namespace>
    name: test-cluster
  providerConfigRef:
    name: argocd

Now manually delete the Cluster from the ArgoCD API, and it doesn't come back up.

As added context, I tried running something like argocd cluster get doesntexist --server <server> with both the user's token and admin creds, and got the error FATA[0000] rpc error: code = PermissionDenied desc = permission denied. I expected the error "Cluster Not Found"

What environment did it happen in?

Crossplane version:v1.13.2-up.2 Crossplane Provider argocd version: xpkg.upbound.io/crossplane-contrib/provider-argocd:v0.5.0-rc.0.2.gf446591 Kubernetes client 1.25.3, Server 1.26.10, on an Upbound Managed Control Plane; ArgoCD running in EKS, running helm release 5.51.2.