Azure / k8s-deploy

GitHub Action for deploying to Kubernetes clusters
MIT License
252 stars 103 forks source link

Bug: Azure/k8s-deploy@v4 doesn't seem to be working after upgrading to AKS 1.24.9 #282

Open ealasgarov opened 1 year ago

ealasgarov commented 1 year ago

What happened?

I have upgraded my private cluster to latest stable version 1.24.9, since then cannot get the pipeline to work. (although I'm also using the new service principle [azure credentials], new clusterrole/binding for that service principle, but I guess here there are no issues).
I have deployed in this way previously with no problems, but now getting "error undefined" upon deploy step.

Here's my pipeline:

name: deploy-test
on: workflow_dispatch
jobs:
  deploy:
    runs-on: platform-aks-runner
    steps:
      - name: Checkout source code 
        uses: actions/checkout@v3
      - name: Set up kubelogin for non-interactive login
        run: |
          sudo rm -f /usr/local/bin/kubelogin
          curl -LO https://github.com/Azure/kubelogin/releases/download/v0.0.28/kubelogin-linux-amd64.zip
          sudo unzip -j kubelogin-linux-amd64.zip -d /usr/local/bin
          rm -f kubelogin-linux-amd64.zip
          kubelogin --version
      - name: Azure login
        id: login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS_DEV }}
      - name: Set AKS context
        id: set-context
        uses: azure/aks-set-context@v3
        with:
          resource-group: '${{ secrets.RESOURCE_GROUP_DEV }}' 
          cluster-name: '${{ secrets.CLUSTER_DEV }}'
          admin: 'false'
          use-kubelogin: 'true'
      - name: Setup kubectl
        id: install-kubectl
        uses: azure/setup-kubectl@v3
      - name: Deploy to AKS
        id: deploy-aks
        uses: Azure/k8s-deploy@v4
        with:
          resource-group: '${{ secrets.RESOURCE_GROUP_DEV }}' 
          name: '${{ secrets.CLUSTER_DEV }}'
          private-cluster: true 
          action: deploy
          force: true
          strategy: basic
          namespace: 'mynamespace'
          manifests: |
             ./resources.yaml
          images: '${{ secrets.registry_dev }}.azurecr.io/myrepo/myimage:latest'

I am not sure what else could be an issue. I think if the problem was with credentials I would get a different error. @OliverMKing Any ideas perhaps? Many thanks in advance!

Version

Runner

self-hosted on AKS, latest version 2.303

Relevant log output

Run Azure/k8s-deploy@v4
  with:
    resource-group: ***
    name: ***
    private-cluster: true
    action: deploy
    force: true
    strategy: basic
    namespace: mynamespace
    manifests: ./resources.yaml

    images: ***.azurecr.io/myrepo/myimage:latest
    pull-images: true
    route-method: service
    version-switch-buffer: 0
    traffic-split-method: pod
    percentage: 0
    token: ***
    annotate-namespace: true
    skip-tls-verify: false
  env:
    AZURE_HTTP_USER_AGENT: 
    AZUREPS_HOST_ENVIRONMENT: 
    KUBECONFIG: /runner/_work/_temp/kubeconfig_1680[2](https://github.com/myrepo/actions/runs/4569561054/jobs/8065945022#step:8:2)150991[3](https://github.com/myrepo/actions/runs/4569561054/jobs/8065945022#step:8:3)1
    KUBE_CONFIG_PATH: /runner/_work/_temp/kubeconfig_168021[5](https://github.com/myrepo/actions/runs/4569561054/jobs/8065945022#step:8:5)0[9](https://github.com/myrepo/actions/runs/4569561054/jobs/8065945022#step:8:9)9[13](https://github.com/myrepo/actions/runs/4569561054/jobs/8065945022#step:8:13)1

**Deploying manifests
  Error: Error: undefined**
ealasgarov commented 1 year ago

P.S. I have applied the same deployment using my user impersonation: kubectl apply -f resource.yaml --as="myserviceprinciple" and it is working, so it cannot be because of permissions, i thinks.

ealasgarov commented 1 year ago

Ok, this is very odd: I have set "private-cluster: false" and it went through, while my cluster is a private cluster, and previously it was working only with "private-cluster: true" parameter. I can see on Azure portal: "Private cluster: Enabled" so nothing changed from that perspective.

OliverMKing commented 1 year ago

Hello @ealasgarov. Thanks for the ping. This is very weird because for private clusters the API server is not publicly accessible. The API server is what kubectl and this action communicate with. The fact that your kubectl apply command worked seems to imply your cluster wasn't actually a private cluster (which doesn't make sense).

P.S. I have applied the same deployment using my user impersonation: kubectl apply -f resource.yaml --as="myserviceprinciple" and it is working, so it cannot be because of permissions, i thinks.

Was this run on the GitHub actions runner or some other way? Or is your actions runner a self-hosted runner?

ealasgarov commented 1 year ago

Hi @OliverMKing, yes, I'm using self-hosted runners which are running on another aks cluster (github-runner-controller project). It's definitely a private cluster, it's also says so on azure portal as well: "Private cluster Enabled". Something has changed in the behavior recently (after I upgraded from 1.23.8 --> 1.24.9). I think we need someone else with a private cluster of version 1.24.x to test this with k8s-deploy actions (with setting private cluster: false/true), to understand what has changed.

OliverMKing commented 1 year ago

@ealasgarov I will test this in a matching environment when I get the chance.

What I believe is happening is your clusters (both the runner and deploy target) are both in the same vnet. In that case you can access the API server endpoint from the runner. What the private-cluster toggle does is switch from using just plain kubectl apply... (and other) commands to using az aks invoke "kubectl apply" commands. If both clusters are in the same vnet you can just use normal kubectl apply commands hence why the private-cluster: false scenario works (and is actually preferred).

This might require a documentation update to make more clear. Hopefully when I reproduce, I find what the actual error with private-cluster: true is (I'm not sure why az aks invoke wouldn't work). The error log itself is another bug, we need to add more context to our logs so we don't print meaningless error statements like this.

ealasgarov commented 1 year ago

@ealasgarov I will test this in a matching environment when I get the chance.

What I believe is happening is your clusters (both the runner and deploy target) are both in the same vnet. In that case you can access the API server endpoint from the runner. What the private-cluster toggle does is switch from using just plain kubectl apply... (and other) commands to using az aks invoke "kubectl apply" commands. If both clusters are in the same vnet you can just use normal kubectl apply commands hence why the private-cluster: false scenario works (and is actually preferred).

This might require a documentation update to make more clear. Hopefully when I reproduce, I find what the actual error with private-cluster: true is (I'm not sure why az aks invoke wouldn't work). The error log itself is another bug, we need to add more context to our logs so we don't print meaningless error statements like this.

Thank you for your reply, actually they are in different vnets, however there's a peering, so maybe you're right, but then I'm not sure why did it work prior to that... it would be great if you can test it in a similar environment and yes, also more verbose logging would be appreciated. :)

github-actions[bot] commented 1 year ago

This issue is idle because it has been open for 14 days with no activity.