google-github-actions / get-secretmanager-secrets

A GitHub Action for accessing secrets from Google Secret Manager and making them available as outputs.
https://cloud.google.com/secret-manager
Apache License 2.0
145 stars 32 forks source link

Issue with locating problem in step to get secret #151

Closed tobiasehlert closed 2 years ago

tobiasehlert commented 2 years ago

TL;DR

Getting secrets from Google Secret Manager by CLI works but not with this GitHub Action app.

Expected behavior

I except the step to return a secret in an output, but the step that is supposed to gets the secret from the Google Secretsmanager is throwing an error instead.

Observed behavior

When using the google-github-actions/get-secretmanager-secrets GitHub Action, I encounter following error:

Error: google-github-actions/get-secretmanager-secrets failed with: Error: Permission 'secretmanager.versions.access' denied for resource 'projects/123456789012/secrets/this-is-my-secret/versions/latest' (or it may not exist).

When running gcloud command directly, the workflow can retrieve the secret in question.

Action YAML

name: my-test-workflow

on:
  workflow_dispatch:

env:
  gcloud_project: secrets-project

jobs:
  test-workflow:
    runs-on: [self-hosted, gke-cluster]

      - name: Checkout
        uses: actions/checkout@v2.4.0

      - name: Create environment variables
        run: |
          echo gcloud_version=$(/google-cloud-sdk/bin/gcloud --format=json version|jq -r '."Google Cloud SDK"') >> $GITHUB_ENV

      - name: Set up Cloud SDK
        uses: google-github-actions/setup-gcloud@v0.3.0
        with:
          project_id: ${{ env.gcloud_project }}
          version: ${{ env.gcloud_version }}

      - name: Get Google Secretsmanager secret by CLI
        id: "cli-secrets"
        run: |
          echo "::set-output name=my-secret::$(gcloud secrets versions access latest --secret='this-is-my-secret')"

      - name: Get Google Secretsmanager secret by APP
        id: "app-secrets"
        uses: "google-github-actions/get-secretmanager-secrets@v0.3.1"
        with:
          secrets: |-
            my-secret:123456789012/this-is-my-secret

      - name: Write the secrets content
        if: always()
        run: |
          echo "${{ steps.cli-secrets.outputs.my-secret }}"
          echo "${{ steps.app-secrets.outputs.my-secret }}"

Additional information

GitHub runners are running in GCP (inside one GKE cluster). For authentication, I am using Workload Identity Federation.

Authentication is not required, since it's authenticating automatically. One thing I noted:

Also, the image (alpine based) that we use as our base does actually have gcloud installed already at a certain version (plus terraform and some other). Therefore we do get the same version and set it to the environmental vars.

Turned on _ACTIONS_STEPDEBUG in the repository to try get more information on what happens inside the step, but it doesn't give me any more usable output.

sethvargo commented 2 years ago

I'm looking into this. So far I've created a VM on GCE and cannot reproduce the issue. I'm working to provision a GKE cluster, but it would be helpful to know more about your GKE setup, how you're running the GitHub Actions Runner, pod configuration, etc.

sethvargo commented 2 years ago

I'm unable to reproduce using a GKE Autopilot cluster either:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: actions-runner

spec:
  replicas: 1
  selector:
    matchLabels:
      app: actions-runner
  template:
    metadata:
      labels:
        app: actions-runner
    spec:
      volumes:
      - name: workdir
        emptyDir: {}
      containers:
      - name: runner
        image: myoung34/github-runner:latest
        resources:
          requests:
            memory: "2Gi"
            cpu: "2000m"
          limits:
            memory: "10Gi"
            cpu: "0500m"
        env:
        - name: ACCESS_TOKEN
          value: '...'
        - name: RUNNER_SCOPE
          value: "org"
        - name: ORG_NAME
          value: sethvargo-demos
        - name: RUNNER_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: RUNNER_WORKDIR
          value: /tmp/github-runner
        volumeMounts:
        - name: workdir
          mountPath: /tmp/github-runner
tobiasehlert commented 2 years ago

Hi @sethvargo,

We use terraform to setup a normal GKE cluster (not on autopilot) and then we use ArgoCD to selig 10 GitHub runner replicas into a namespace, which are registered on one repository only and not on an organizational level. The image that we make is an alpine container where we are applications live gcloud, Helm, terraform, terragrunt and Valero. After that we add some more Docker steps to add GitHub runner Realtek binaries and then when they are deployed by Argo, they register towards our repo.

One thing (which I mentioned) is that we have like an provisional admin account used to deploy GCP resources with terraform and that account is in another project as where the GKE cluster is located. So the SA that the cluster things and seems to be used in the south step is not the right one. I thought by just adding the other project id, the south should work. And it does when running gcloud from the cli but not when doing the gsm (secrets manager) step.

Are you using a separate SA from inside GCP that you create a pool as described in the setup steps? Because we don't use the one inside the GKE cluster.

Is they maybe some possibility to add a debug flag to the step to maybe get some more output like what binary it is running or what user it is actually using when performing the attempt to connect to the GSM? Today there is nothing possible to see in the logs even with the actions debug flag set to true.

Regards, Tobias

sethvargo commented 2 years ago

Hi @tobiasehlert

Without having your exact setup, it's very difficult to try and debug this.

We use terraform to setup a normal GKE cluster

There is no such thing as a "normal" GKE cluster. Do you have Workload Identity enabled on the cluster? Are there multiple node pools? Do you have metadata concealment enabled?

ArgoCD to selig 10 GitHub runner replicas into a namespace

I'm not familiar with ArgoCD. What does "selig" mean?

One thing (which I mentioned) is that we have like an provisional admin account used to deploy GCP resources with terraform and that account is in another project as where the GKE cluster is located.

I'm not sure I follow. What service account is attached to the node pool? What permissions does that service account have? Is that service account mapped to the Kubernetes service account via Workload Identity?

Unless you have an organizational constraint which prevents cross-project permissions, it doesn't matter that the service account resides in a different project.

So the SA that the cluster things and seems to be used in the south step is not the right one.

Sorry, I don't understand what you're saying here.

And it does when running gcloud from the cli but not when doing the gsm (secrets manager) step.

We don't do anything "special" with regards to resolving credentials. We use the default GCP client libraries which resolve Application Default Credentials.

Are you using a separate SA from inside GCP that you create a pool as described in the setup steps? Because we don't use the one inside the GKE cluster.

I'm not sure what you mean here.

Is they maybe some possibility to add a debug flag to the step to maybe get some more output like what binary it is running or what user it is actually using when performing the attempt to connect to the GSM?

There is no binary running. The action uses the default client library which resolves Application Default Credentials automatically, so there's nothing to log. I've also verified this is working with an Autopilot cluster and a GKE cluster with Workload Identity enabled using the app.yaml above, so I'm fairly certain this is something related to your configuration.

tobiasehlert commented 2 years ago

Hi @sethvargo,

Sorry for the typos, I was using my loaner-cellphone with some annoying autocorrect 😵‍💫

First answering to your questions:

We use terraform to setup a normal GKE cluster

There is no such thing as a "normal" GKE cluster. Do you have Workload Identity enabled on the cluster? Are there multiple node pools? Do you have metadata concealment enabled?

Details about the cluster:

What is metadata concealment?

ArgoCD to selig 10 GitHub runner replicas into a namespace

I'm not familiar with ArgoCD. What does "selig" mean?

It was supposed to be setup.. sorry for the confusion.

One thing (which I mentioned) is that we have like an provisional admin account used to deploy GCP resources with terraform and that account is in another project as where the GKE cluster is located.

I'm not sure I follow. What service account is attached to the node pool? What permissions does that service account have? Is that service account mapped to the Kubernetes service account via Workload Identity?

Unless you have an organizational constraint which prevents cross-project permissions, it doesn't matter that the service account resides in a different project.

When doing gcloud info in a step, this was what I was seeing (masked):

Account: [my-gke-cluster-1g68.svc.id.goog] 
Project: [my-gke-cluster-1g68]

What I was expecting to see:

Account: [my-terraform-admin@secrets-project.iam.gserviceaccount.com]
Project: [secrets-project]

So the SA that the cluster things and seems to be used in the south step is not the right one.

Sorry, I don't understand what you're saying here.

The my-gke-cluster-1g68.svc.id.goog is nothing that is mapped to a svc in the project that is supposed to have elevated permissions. So yeah.. for some reason those should not be used?

Are you using a separate SA from inside GCP that you create a pool as described in the setup steps? Because we don't use the one inside the GKE cluster.

I'm not sure what you mean here.

Basically this is our setup (very simplified):

So basically, I want to use the my-terraform-admin SA when getting the secret from my secrets-project, because the SA in my-gke-cluster-1g68 doesn't have permissions accessing those.

Kind regards, Tobias

sethvargo commented 2 years ago

What is metadata concealment?

https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment

It's mutually exclusive with Workload Identity

When doing gcloud info in a step, this was what I was seeing (masked):

What is the output of the following inside the step:

curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/

The my-gke-cluster-1g68.svc.id.goog is nothing that is mapped to a svc in the project that is supposed to have elevated permissions. So yeah.. for some reason those should not be used?

If you've enabled Workload Identity on the cluster (which it looks like you have), then you need to map the Kubernetes Service Account (KSA) to a Google Service Account (GSA). This is a two-step process that involves:

  1. An IAM grant

    gcloud iam service-accounts add-iam-policy-binding GSA_NAME@PROJECT_ID.iam.gserviceaccount.com \
        --role roles/iam.workloadIdentityUser \
        --member "serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME]"
  2. A Kubernetes annotation

    kubectl annotate serviceaccount KSA_NAME \
        --namespace K8S_NAMESPACE \
        iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com

If you're running in the default namespace with the default service account and a GSA named "my-service", that would look like:

gcloud iam service-accounts add-iam-policy-binding my-service@PROJECT_ID.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:PROJECT_ID.svc.id.goog[default/default]"

kubectl annotate serviceaccount default \
    --namespace default \
    iam.gke.io/gcp-service-account=my-service@PROJECT_ID.iam.gserviceaccount.com

So basically, I want to use the my-terraform-admin SA when getting the secret from my secrets-project, because the SA in my-gke-cluster-1g68 doesn't have permissions accessing those.

Then you should:

  1. (Optional) Create a namespace
  2. Create a dedicated KSA
  3. Grant the KSA permission to impersonate the GSA (step 1 above)
  4. Annotate the KSA with the name of the GSA

There's some more general purpose troubleshooting steps in the GCP docs.

sethvargo commented 2 years ago

Closing due to lack of response.