argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.98k stars 5.47k forks source link

ArgoCD and Gcloud Workload Identity Improvements #17279

Open withnale opened 9 months ago

withnale commented 9 months ago

We make extensive use of ArgoCD to bootstrap our Kubernetes clusters and we have many clusters we are constantly deploying so our bootstrap code is exercised daily. Below are the steps required to enable ArgoCD throughout our estate.

Our main goals are:

NB: I have deliberately tried to consolidate all of the work to get Workload Identity working in the current ArgoCD release below, since this is not well documented at present and I had to piece together information deep within the comments of multiple issues as well as the source code to get this working.

I am aware of the discussion in the issue https://github.com/argoproj/argo-cd/issues/10218 and the preference not to "reinvent the wheel" and delegate the responsibility for this to a separate deployment of the External Secrets Operator. However, this doesn't provide for a very clean solution when you wish for ArgoCD to be the first thing to be bootstrapped onto a cluster and mastering subsequent payloads.

The work to support Workload Identity for Source Repositories introduced a gcpServiceAccountKeyfield to the Secret manifest. Although this is still very verbose, it would be great if this approach would also work for OCI registries since it is possible to login to OCI registries using this token, and it is already available for git registries. However, after looking through the code it's clear that this is only useable for git repos.

Would you accept a PR to allow this to be used for OCI repos? This feels like a relatively small change that would alleviate the need for an initContainer in the approaches outlined below.

In the longer term it would be great if this setup could be made much cleaner. There is a lot of complexity here and it would be very easy to mess up with so many moving parts. In particular the use of a gcpServiceAccount key seems very complex with a great deal of per-cluster substitution required just to consume a token that is already mounted within the container.

Current Implementation

We use helm to install ArgoCD and fortunately the chart is pretty flexible allowing for initContainers and additional volumes to be defined from the values.yaml and so these instructions below work without chart modification.

Of course, with both approaches you need to have setup Workload Identity for the repoServer serviceAccount. You have access to .repoServer.serviceAccount.annotations within the values.yaml file to configure WI from the KSA side.

Helm OCI Support to Google Artifact Registry using Workload Identity

The approach to enable this is broadly outlined in the following issue: https://github.com/argoproj/argo-cd/issues/11492

This generates an initContainer to configure the repoServer with gcloud docker credentials, and shares the necessary volumes with the main container.

# values.yaml
repoServer:
  volumes:
    - name: docker-credential-gcr
      emptyDir: {}
    - name: docker-config
      emptyDir: {}
  volumeMounts:
    - mountPath: /home/argocd/.docker
      name: docker-config
    - name: docker-credential-gcr
      mountPath: /usr/local/bin/docker-credential-gcr
      subPath: docker-credential-gcr
      readOnly: true
  initContainers:
    - command:
        - sh
        - -c
        - >
          cp -rp /tmp/docker-credential-gcr /usr/local/bin/ && 
          docker-credential-gcr configure-docker 
          --registries=europe-west1-docker.pkg.dev && 
          chmod +r /root/.docker/config.json
      image: mypersonalrepo/docker-credential-gcr:latest
      name: copy-docker-credential-gcr
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
        seccompProfile:
          type: RuntimeDefault
      volumeMounts:
        - name: docker-credential-gcr
          mountPath: /usr/local/bin
        - mountPath: /root/.docker
          name: docker-config

This is slightly different from the original issue since it is referencing a custom image for the initContainer instead of dynamically pulling the tarball down at runtime (and having an associated runtime dependency that might fail).

# Dockerfile
FROM alpine:latest

RUN wget -O /tmp/docker-credential-gcr.tar.gz https://github.com/GoogleCloudPlatform/docker-credential-gcr/releases/download/v2.1.22/docker-credential-gcr_linux_amd64-2.1.22.tar.gz
RUN tar -xzf /tmp/docker-credential-gcr.tar.gz -C /tmp

NB: I have deliberately used docker-credential-gcr rather than gcloud-cli since installing gcloud-cli generates an image over 2GB in size. We only need the credential helper and so this is a much smaller image and docker-credential-gcr is still maintained by Google.

After that, if you copy the stefanprodan podinfo chart into your repo you should be able to run the following Application manifest:

oras cp -r ghcr.io/stefanprodan/charts/podinfo:6.5.4 europe-west1-docker.pkg.dev/registry-project/registry/stefanprodan/charts/podinfo:6.5.4
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: podinfo
  namespace: argocd
spec:
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: podinfo
    helm:
      passCredentials: false
    repoURL: europe-west1-docker.pkg.dev/registry-project/registry/stefanprodan/charts
    targetRevision: 6.5.4

Git support to Google Cloud Source Repos using Workload Identity

The approach to enable this is broadly outlined in the following issue: https://github.com/argoproj/argo-cd/issues/15361

This requires mounting a projected volume of the workload identity kubernetes service account token into the argocd-repo-server pod. Once present you can define an ArgoCD repository secret which includes a gcpServiceAccount payload of type external_account using the mounted token as the credential source.

# values.yaml
repoServer:
  volumes:
    - name: gcp-ksa
      projected:
        defaultMode: 420
        sources:
          - serviceAccountToken:
              path: token
              audience: __CLUSTER_PROJECT_ID__.svc.id.goog
              expirationSeconds: 172800
  volumeMounts:
    - name: gcp-ksa
      mountPath: /var/run/secrets/tokens/gcp-ksa
      readOnly: true

Then you need to create an ArgoCD external account secret that references the token (replacing the placeholders with your own values).

apiVersion: v1
kind: Secret
metadata:
  name: repo-gcloud-source
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: repository
stringData:
  type: git
  url: https://source.developers.google.com/p/__REGISTRY_PROJECT_ID__/r/__GIT_REPO_NAME__
  gcpServiceAccountKey: |
    {
      "type": "external_account",
      "audience": "identitynamespace:__CLUSTER_PROJECT_ID__.svc.id.goog:https://container.googleapis.com/v1/projects/__CLUSTER_PROJECT_ID__/locations/europe-west1/clusters/__CLUSTER_NAME__",
      "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/__CLUSTER_GCLOUD_SERVICE_ACCOUNT__:generateAccessToken",
      "subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
      "token_url": "https://sts.googleapis.com/v1/token",
      "client_email": "must.be.non.empty@localhost.localdomain",
      "credential_source": {
        "file": "/var/run/secrets/tokens/gcp-ksa/token"
      }
    }

Once this is created you can then create an Application resource that uses this repo.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: podinfo-git
  namespace: argocd
spec:
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  project: default
  source:
    repoURL: https://source.developers.google.com/p/registry-project/r/git-repo-name
    path: components/podinfo
    targetRevision: main

Comparison to Flux Helm Workload Identity Support

The flux Helm OCI Workload Identity support is native as befits a system that is typically the first thing to be bootstrapped onto a cluster. It is also very concise with the WI support being enabled by the single provider: gcp directive in the HelmRepository resource.

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: charts-dev
  namespace: flux-system
spec:
  provider: gcp
  type: oci
  url: oci://europe-docker.pkg.dev/registry-project/registry
danielyaba commented 8 months ago

Hi @withnale,

I did everything you mentioned in Git support to Google Cloud Source Repos using Workload Identity Once I am trying to add a google source repository I can see a new secrets has been added with the name repo- Service Account in GCP has workload identity permissions and annotation has been added to "argocd-repo-server" kubernetes service account I always get the error "error testing repository connectivity: authentication required" Even when changed the logging level to debug Its still displays only this error in repo-server logs

BTW, I added the repository with connection method HTTPS

what further steps I can do to debug the issue ?

withnale commented 8 months ago

what further steps I can do to debug the issue ?

It's very tricky to setup and there is next to no logging even at debug level when credentials are being used, making it especially difficult.

The first question I had was is the repo-gcloud-source secret even being pickup correctly even before it tries to associate the gcloud credential. The easiest way I found for validating this was to change the credential_source.file reference to a file that does not exist. In that event, you at least get an error in the logs telling you that it's trying to use a token file that doesn't exist.

After that, make sure you've replaced all the references such as __CLUSTER_PROJECT_ID__ with values specific to your setup.

You can always kubectl exec or kubectl debug into the pod and try some of the stuff interactively.

withnale commented 8 months ago

Can I get some feedback on this? It would be great to improve this flow.

danielyaba commented 8 months ago

Unfortunately I didn't get it working

withnale commented 8 months ago

No. I meant, can I get some feedback from one of the developers about whether we can get improvements to make this way easier!?

The above works for us consistently so it is possible. It's just amazingly kiunky.

danielyaba commented 8 months ago

Hi @withnale,

I was able to make it work :-) My problem was that I am using branch "main" and argocd by default tried to fetch branch "master" (which is the default one) How can I set ArgoCD to fetch the "main" branch ? I didn't find anything on GCP side to accomplish this