argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.82k stars 5.44k forks source link

Remote cluster with Workload Identity configured with IAM Policy to the KSA, not working #20368

Open froblesmartin opened 3 weeks ago

froblesmartin commented 3 weeks ago

Checklist:

Describe the bug

I'm not sure if this is a bug or a feature request.

I have configured ArgoCD (running in GKE) with an external GKE cluster located in a different GCP Project. Following the official documentation to use Workload Identity with ArgoCD, creating the GCP IAM Service Account and adding the annotation to the KSA, it does work.

I tried using the new approach from Google, which instead of requiring a GCP IAM Service Account and the annotation in the KSA, you can just assign GCP IAM Roles directly to the KSA, referencing it from GCP IAM policies like:

principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/NAMESPACE/sa/KSA_NAME

But this does not work. When trying, I just get the error from the argocd-application-controller:

error synchronizing cache state : the server has asked for the client to provide credentials

Maybe the argocd-k8s-auth just needs to use a newer version of the GCP SDK, it requires a different configuration, or it is something harder.

To Reproduce

Enable GKE Workload Identity, and from a different GCP Project, assign the role for ArgoCD to manage a GKE cluster in that GCP Project:

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/container.admin \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/argocd/sa/argocd-server \
    --condition=None

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/container.admin \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/argocd/sa/argocd-application-controller \
    --condition=None

Then configure your remote GKE cluster with the following K8s manifest in the GKE cluster where ArgoCD is deployed:

apiVersion: v1
kind: Secret
metadata:
  name: argocd-cluster-remote1
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
  name: remote1
  server: https://<your-remote-cluster-endpoint>
  config: |
    {
      "execProviderConfig": {
        "command": "argocd-k8s-auth",
        "args": ["gcp"],
        "apiVersion": "client.authentication.k8s.io/v1beta1"
      },
      "tlsClientConfig": {
        "insecure": false,
        "caData": "LS0tLS1...."
      }
    }

Expected behavior

I would expect ArgoCD to authenticate correctly.

Version

v2.12.4+27d1e64
apoole-q6cyber commented 1 week ago

I'm experiencing the same issue. I did the following above and I tested the service accounts to make sure that they had container.admin and iam.serviceAccountAdmin as described in the google docs .

I had also created a test pod to test to make sure I could list and mutate my clusters with the argocd-server/argocd-application-controller service accounts.

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: NAMESPACE
spec:
  serviceAccountName: argocd-server
  #serviceAccountName: argocd-application-controller
  containers:
  - name: test-pod
    image: google/cloud-sdk:slim
    command: ["sleep","infinity"]
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
        ephemeral-storage: 10Mi

List clusters. gcloud container clusters list

The next thing I'm going to try is creating a fleet to see if that works https://cloud.google.com/blog/products/containers-kubernetes/connect-gateway-with-argocd

bpoole6 commented 1 week ago

I was able to get it to work by having the KSA(Kubernetes Service Account) impersonate a GSA(GCP Service Account).

You can follow the instructions on how to do it here. Workload Federation Alternative.

Once you have it setup then what you have above should work. @froblesmartin

Its annoying that there aren't good examples of setting this up in the wild. I spent a day and half trying to get this to work. 🙄

ArgoCd

froblesmartin commented 1 week ago

I was able to get it to work by having the KSA(Kubernetes Service Account) impersonate a GSA(GCP Service Account).

You can follow the instructions on how to do it here. Workload Federation Alternative.

Once you have it setup then what you have above should work. @froblesmartin

Its annoying that there aren't good examples of setting this up in the wild. I spent a day and half trying to get this to work. 🙄

ArgoCd

Yes, creating a GCP IAM Service Account, assigning the role roles/iam.workloadIdentityUser to the KSA and adding the annotation iam.gke.io/gcp-service-account to the KSA, does work.

The point of this issue is about the new approach which does not require any of that 😄

bpoole6 commented 1 week ago

I tried using the new approach from Google, which instead of requiring a GCP IAM Service Account and the annotation in the KSA, you can just assign GCP IAM Roles directly to the KSA, referencing it from GCP IAM policies like:

Haha yea I overlooked that part of your post.

Were able to get workload identities working?

froblesmartin commented 1 week ago

Haha yea I overlooked that part of your post.

Were able to get workload identities working?

@bpoole6 Yes, it did work, but only with the old approach, thus this issue. 😄

froblesmartin commented 1 week ago

@toVersus I have seen you implemented this. Would you know about this issue? What may be? I would be happy to contribute, but would be great to have some initial ideas 😄

toVersus commented 1 week ago

Oh, so it doesn’t work with the new Workload Identity Federation? I’m not too familiar with the details of OIDC, so I’m not sure what’s causing it, but it might be running into some limitation within Workload Identity Federation. As you know, the implementation of argocd-k8s-auth gcp command is straightforward. It just uses the Google Cloud OAuth2 client library to retrieve the Application Default Credentials, extract the access token, and generate an ExecCredential.

use a newer version of the GCP SDK

Updating the golang.org/x/oauth2 shouldn’t be necessary, as using Workload Identity Federation doesn’t require the specific version. As a test, I built the argocd-k8s-auth binary from the current master branch of Argo CD (golang.org/x/oauth2 v0.23.0), copied it to the Pod, and tried using it, but it still didn’t work and showed the same error.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: cluster-mycluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
  name: mycluster
  server: <api_server_address>
  config: |
    {
      "execProviderConfig": {
        "command": "/home/argocd/argocd-k8s-auth",
        "args": ["gcp"],
        "apiVersion": "client.authentication.k8s.io/v1beta1"
      },
      "tlsClientConfig": {
        "insecure": false,
        "caData": "**Redacted**"
      }
    }
EOF

As you can see, we were able to obtain the access token from the GKE metadata server when using Workload Identity Federation. If it works the same way as before, it doesn’t seem like any code changes will be needed.

$ argocd-k8s-auth gcp
{"kind":"ExecCredential","apiVersion":"client.authentication.k8s.io/v1beta1","spec":{"interactive":false},"status":{"expirationTimestamp":"2024-10-29T14:11:36Z","token":"ya29.d.**redacted**"}}

However, there seem to be some limitations, which might be affecting it.

$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/identity
Your Kubernetes service account (argocd/argocd-application-controller) is not annotated with a target Google service account, which is a requirement for retrieving Identity Tokens using Workload Identity.
Please add the iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_ID] annotation to your Kubernetes service account.
Refer to https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

I’m sorry I couldn’t be of more help, but this is all I know for now.