google-github-actions / get-gke-credentials

A GitHub Action that configure authentication to a GKE cluster.
https://cloud.google.com/gke
Apache License 2.0
106 stars 41 forks source link

use_internal_ip doesn't work for my prviate GKE cluster #308

Closed yanqianglu closed 2 months ago

yanqianglu commented 4 months ago

TL;DR

I've set use_internal_ip for my private GKE cluster but keep getting timeout when executing kubectl command.

Expected behavior

kubectl get nodes -v=10

Should see nodes being listed out

Observed behavior

I0622 02:35:53.986810    1628 loader.go:395] Config loaded from file:  /home/runner/work/xx/xx/gha-kubeconfig-5b2ec3b475445844
I0622 02:35:53.987859    1628 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2;as=APIGroupDiscoveryList,application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.30.2 (linux/amd64) kubernetes/3968350" -H "Authorization: ***" 'https://172.16.0.2/api?timeout=32s'
I0622 02:36:23.988713    1628 round_trippers.go:508] HTTP Trace: Dial to tcp:172.16.0.2:443 failed: dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:23.988773    1628 round_trippers.go:553] GET https://172.16.0.2/api?timeout=32s  in 30000 milliseconds
I0622 02:36:23.988784    1628 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 30000 ms TLSHandshake 0 ms Duration 30000 ms
I0622 02:36:23.988792    1628 round_trippers.go:577] Response Headers:
E0622 02:36:23.988945    1628 memcache.go:265] couldn't get current server API group list: Get "https://172.16.0.2/api?timeout=32s": dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:23.990477    1628 cached_discovery.go:120] skipped caching discovery info due to Get "https://172.16.0.2/api?timeout=32s": dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:23.990626    1628 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2;as=APIGroupDiscoveryList,application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.30.2 (linux/amd64) kubernetes/3968350" -H "Authorization: ***" 'https://172.16.0.2/api?timeout=32s'
I0622 02:36:53.991522    1628 round_trippers.go:508] HTTP Trace: Dial to tcp:172.16.0.2:443 failed: dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:53.991568    1628 round_trippers.go:553] GET https://172.16.0.2/api?timeout=32s  in 30000 milliseconds
I0622 02:36:53.991580    1628 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 30000 ms TLSHandshake 0 ms Duration 30000 ms
I0622 02:36:53.991587    1628 round_trippers.go:577] Response Headers:
E0622 02:36:53.991635    1628 memcache.go:265] couldn't get current server API group list: Get "https://172.16.0.2/api?timeout=32s": dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:53.991646    1628 cached_discovery.go:120] skipped caching discovery info due to Get "https://172.16.0.2/api?timeout=32s": dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:53.991655    1628 shortcut.go:103] Error loading discovery information: Get "https://172.16.0.2/api?timeout=32s": dial tcp 172.16.0.2:443: i/o timeout
I0622 02:36:53.991738    1628 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2;as=APIGroupDiscoveryList,application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.30.2 (linux/amd64) kubernetes/3968350" -H "Authorization: ***" 'https://172.16.0.2/api?timeout=32s'

After disable authorized networks and removed use_internal_ip, it works.

Action YAML

name: Build and Deploy to GKE

on:
  push:
    tags:
      - "backend-v*"
  workflow_dispatch:
    inputs:
      tag:
        description: "Tag of the source code to deploy"
        required: true

permissions:
  contents: read
  id-token: write

env:
  PROJECT_ID: xx-svc-ddx1
  GKE_CLUSTER: projects/xx/locations/us-west1/clusters/autopilot-cluster-1
  DEPLOYMENT_NAME: xx-backend
  IMAGE: xx-backend
  ARTIFACT_REGISTRY_REGION: us-west1
  REPOSITORY: xx

jobs:
  build-and-publish:
    name: Build and Publish
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.inputs.tag || github.ref }}

      - id: "auth"
        uses: "google-github-actions/auth@v2.1.3"
        with:
          workload_identity_provider: "projects/xx/locations/global/workloadIdentityPools/github/providers/xx-repo"
          service_account: "github-actions-sa@xx-svc-ddx1.iam.gserviceaccount.com"

      # Setup gcloud CLI
      - name: Setup gcloud CLI
        uses: google-github-actions/setup-gcloud@v2.1.0
        with:
          project_id: ${{ env.PROJECT_ID }}

      # Configure Docker to use gcloud as a credential helper
      - name: Configure Docker
        run: gcloud auth configure-docker "${{ env.ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev"

      # Check if the Docker image already exists
      - name: Check if Image Exists
        id: check-image
        run: |
          if gcloud artifacts docker tags list "${{ env.ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.IMAGE }}" | grep -w "${{ github.event.inputs.tag || github.ref_name }}\s"; then
            echo "Image exists"
            echo "IMAGE_EXISTS=true" >> $GITHUB_ENV
          else
            echo "Image does not exist"
            echo "IMAGE_EXISTS=false" >> $GITHUB_ENV
          fi

      - name: Set up Docker Buildx
        if: env.IMAGE_EXISTS == 'false'
        uses: docker/setup-buildx-action@v3.3.0

      - name: Cache Docker layers
        if: env.IMAGE_EXISTS == 'false'
        uses: actions/cache@v3
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-buildx-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-buildx-

      - name: Build and Push Docker image
        if: env.IMAGE_EXISTS == 'false'
        uses: docker/build-push-action@v6.1.0
        with:
          context: backend
          file: backend/Dockerfile
          tags: ${{ env.ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.IMAGE }}:${{ github.event.inputs.tag || github.ref_name }}
          push: true
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=local,dest=/tmp/.buildx-cache-new,mode=max

      - name: Merge new cache to old cache
        if: success() && env.IMAGE_EXISTS == 'false'
        run: rsync -a /tmp/.buildx-cache-new/ /tmp/.buildx-cache/

  deploy:
    name: Deploy to GKE
    runs-on: ubuntu-latest
    needs: build-and-publish

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.inputs.tag || github.ref }}

      - id: "auth"
        uses: "google-github-actions/auth@v2.1.3"
        with:
          workload_identity_provider: "projects/xx/locations/global/workloadIdentityPools/github/providers/xx-repo"
          service_account: "github-actions-sa@xx-svc-ddx1.iam.gserviceaccount.com"

      - name: Setup gcloud CLI
        uses: google-github-actions/setup-gcloud@v2.1.0
        with:
          project_id: ${{ env.PROJECT_ID }}

      - name: Get the GKE credentials so we can deploy to the cluster
        uses: google-github-actions/get-gke-credentials@v2.2.0
        with:
          project_id: ${{ env.PROJECT_ID }}
          cluster_name: ${{ env.GKE_CLUSTER }}

      - id: "simple-test"
        run: "kubectl get nodes -v=10"

      - name: Download and setup kustomize
        run: |
          cd operations/xx-backend
          curl -sfLo kustomize https://github.com/kubernetes-sigs/kustomize/releases/download/v3.1.0/kustomize_3.1.0_linux_amd64
          chmod u+x kustomize

      - name: Deploy the Docker image to the GKE cluster
        run: |
          cd operations/xx-backend
          ./kustomize edit set image $PROJECT_ID/$REPOSITORY/$IMAGE:${{ github.event.inputs.tag || github.ref_name }}
          ./kustomize build . | kubectl apply -f -
          kubectl rollout status deployment/$DEPLOYMENT_NAME -n xx
          kubectl get services -o wide -n xx

Log output

No response

Additional information

No response

sethvargo commented 4 months ago

Hi @yanqianglu - if your cluster does not have an IP address or connector, then it's not accessible from the Internet, therefore GitHub Actions runners would not be able to connect to it. You need to use a Connect Gateway.

yanqianglu commented 4 months ago

Hi @yanqianglu - if your cluster does not have an IP address or connector, then it's not accessible from the Internet, therefore GitHub Actions runners would not be able to connect to it. You need to use a Connect Gateway.

Hi, my cluster is not managed by fleet and it's not a GKE enterprise version neither so probably can't use Connect Gateway. Is there any other way to connect to a private cluster?

Another question related to that, according to https://github.com/google-github-actions/get-gke-credentials/blob/77f2de852b126198c28497b5ce36f09cab2a4816/src/main.ts#L52, it seems like the use_internal_ip and connect gateway are exclusive to each other, so I'm wondering what's the correct setup to use use_internal_ip?

sethvargo commented 4 months ago

Correct, they are mutually exclusive. Either:

  1. Use Config Connector
  2. Establish a VPN connection/bastion host in a previous step, and then use the internal address to connect to the cluster

Private GKE clusters are not exposed to the Internet, so you need to establish a presence on the VPC network in order to connect.