actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.68k stars 1.11k forks source link

Node.js Executable Not Found and running When Using gcr.io/kaniko-project/executor:debug Base Image #3687

Open kanakaraju17 opened 2 months ago

kanakaraju17 commented 2 months ago

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

To Reproduce

1. Deploy the `gha-runner-scale-set` with type kubernetes mode enabled.

Describe the bug

I'm currently running GitHub Actions runners with mode:kubernetes enabled, using two different base containers as the image: ubuntu:latest and gcr.io/kaniko-project/executor:debug.

When I use gcr.io/kaniko-project/executor:debug as the base image, the Node.js executable is not found at the specified path, causing the workflow to fail immediately with the following error:

Screenshot 2024-07-30 at 7 56 11 PM

Upon inspecting the container, Node.js is present but not in an executable form, which causes the container to fail with the below error:

sh: /__e/node20/bin/node: not found

When I exec into the pod using the gcr.io/kaniko-project/executor:debug image:

/__e # ls
node16  node20
/__e #
/__e # cd node20/
/__e/node20 # ls
CHANGELOG.md  LICENSE       README.md     bin           include       lib           share
/__e/node20 #
/__e/node20 # cd bin/
/__e/node20/bin # ls
corepack  node      npm       npx
/__e/node20/bin #
/__e/node20/bin # pwd
/__e/node20/bin
/__e/node20/bin # /__e/node20/bin/node
sh: /__e/node20/bin/node: not found
/__e/node20/bin #
/__e/node20/bin #
/__e/node20/bin #
/__e/node20/bin # /__e/node20/bin/node
sh: /__e/node20/bin/node: not found
sh: /__e/node20/bin/node: not found
Screenshot 2024-07-30 at 7 58 12 PM

Workflow File Using gcr.io/kaniko-project/executor:debug:

name: Build and Deploy

on:
  workflow_call:
    inputs:
      branch:
        required: true
        type: string
        default: 'main'
      build_registry:
        required: true
        type: string

jobs:
  arm-build:
    runs-on: [test-runners]
    container:
      image: gcr.io/kaniko-project/executor:debug
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0    

Working Example with ubuntu:latest

When using the ubuntu:latest image, the Node.js executable is present and the workflow runs as expected.

name: Build and Deploy

on:
  workflow_call:
    inputs:
      branch:
        required: true
        type: string
        default: 'main'
      build_registry:
        required: true
        type: string

jobs:
  arm-build:
    runs-on: [test-runners]
    container:
      image: ubuntu:latest
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0    

Node.js Executable Found:

root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin# ls
corepack  node  npm  npx
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin#
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin#
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin# pwd
/__e/node20/bin
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin# /__e/node20/bin/node
Welcome to Node.js v20.13.1.
Type ".help" for more information.
>

This issue is a major blocker as it prevents us from using the customized image based on our requirements. We need to run the containers using our predefined image with custom images and packages installed.

I need help to resolve this issue so that the Node.js executable can be found and used correctly when using the gcr.io/kaniko-project/executor:debug image. Any insights or solutions would be greatly appreciated.

Describe the expected behavior

The workflow should run successfully and Node.js packages should be installed correctly, even when using a customized image such as gcr.io/kaniko-project/executor:debug. The Node.js executable should be found and functional, ensuring that all steps in the workflow proceed without errors.

Additional Context

## githubConfigUrl is the GitHub url for where you want to configure runners
## ex: https://github.com/myorg/myrepo or https://github.com/myorg
githubConfigUrl: "https://github.com/"

## githubConfigSecret is the k8s secrets to use when auth with GitHub API.
## You can choose to use GitHub App or a PAT token
githubConfigSecret:
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  #github_app_id: ""
  #github_app_installation_id: ""
  #github_app_private_key: |

  ### GitHub PAT Configuration
  # github_token: ""
## If you have a pre-define Kubernetes secret in the same namespace the gha-runner-scale-set is going to deploy,
## you can also reference it via `githubConfigSecret: pre-defined-secret`.
## You need to make sure your predefined secret has all the required secret data set properly.
##   For a pre-defined secret using GitHub PAT, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_token='ghp_your_pat'
##   For a pre-defined secret using GitHub App, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_app_id=123456 --from-literal=github_app_installation_id=654321 --from-literal=github_app_private_key='-----BEGIN CERTIFICATE-----*******'
githubConfigSecret: github-token

## proxy can be used to define proxy settings that will be used by the
## controller, the listener and the runner of this scale set.
#
# proxy:
#   http:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   https:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   noProxy:
#     - example.com
#     - example.org

# maxRunners is the max number of runners the autoscaling runner set will scale up to.
# maxRunners: 5

# minRunners is the min number of idle runners. The target number of runners created will be
# calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 2

runnerGroup: "test-runners"

# ## name of the runner scale set to create.  Defaults to the helm release name
runnerScaleSetName: "test-runners"

## A self-signed CA certificate for communication with the GitHub server can be
## provided using a config map key selector. If `runnerMountPath` is set, for
## each runner pod ARC will:
## - create a `github-server-tls-cert` volume containing the certificate
##   specified in `certificateFrom`
## - mount that volume on path `runnerMountPath`/{certificate name}
## - set NODE_EXTRA_CA_CERTS environment variable to that same path
## - set RUNNER_UPDATE_CA_CERTS environment variable to "1" (as of version
##   2.303.0 this will instruct the runner to reload certificates on the host)
##
## If any of the above had already been set by the user in the runner pod
## template, ARC will observe those and not overwrite them.
## Example configuration:
#
# githubServerTLS:
#   certificateFrom:
#     configMapKeyRef:
#       name: config-map-name
#       key: ca.crt
#   runnerMountPath: /usr/local/share/ca-certificates/

## Container mode is an object that provides out-of-box configuration
## for dind and kubernetes mode. Template will be modified as documented under the
## template object.
##
## If any customization is required for dind or kubernetes mode, containerMode should remain
## empty, and configuration should be applied to the template.
containerMode:
  type: "kubernetes"  ## type can be set to dind or kubernetes
  ## the following is required when containerMode.type=kubernetes
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteOnce"]
    # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
    storageClassName: "gp3"
    resources:
      requests:
        storage: 5Gi
#   kubernetesModeServiceAccount:
#     annotations:

## listenerTemplate is the PodSpec for each listener Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
listenerTemplate:
  spec:
    nodeSelector:
      purpose: github-actions
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions
        effect: NoSchedule   
    containers:
    # Use this section to append additional configuration to the listener container.
    # If you change the name of the container, the configuration will not be applied to the listener,
    # and it will be treated as a side-car container.
    - name: listener
      resources:
        limits:
          cpu: "500m"
          memory: "500Mi" 
        requests:
          cpu: "250m"
          memory: "250Mi"
      # securityContext:
        # runAsUser: 1000
#     # Use this section to add the configuration of a side-car container.
#     # Comment it out or remove it if you don't need it.
#     # Spec for this container will be applied as is without any modifications.
#     - name: side-car
#       image: example-sidecar

## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
template:
  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /etc/config/runner-template.yaml
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - mountPath: /etc/config
            name: hook-template
      volumes:
        - name: hook-template
          configMap:
            name: runner-config
        - name: work
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: "local-path"
                resources:
                  requests:
                    storage: 1Gi          
  spec:
    securityContext:
      fsGroup: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
    nodeSelector:
      purpose: github-actions-arm
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions-arm
        effect: NoSchedule       

## Optional controller service account that needs to have required Role and RoleBinding
## to operate this gha-runner-scale-set installation.
## The helm chart will try to find the controller deployment and its service account at installation time.
## In case the helm chart can't find the right service account, you can explicitly pass in the following value
## to help it finish RoleBinding with the right service account.
## Note: if your controller is installed to only watch a single namespace, you have to pass these values explicitly.
# controllerServiceAccount:
#   namespace: arc-system
#   name: test-arc-gha-runner-scale-set-controller

Controller Logs

https://gist.github.com/kanakaraju17/4f58c0b332451ef6fab345a8078a6b3b

Runner Pod Logs

https://gist.github.com/kanakaraju17/c61f8da3038741634acea68f40c12afc
andersbackman-rf commented 3 weeks ago

also interested in this since we want to get secrets to repository from hashicorp/vault-action