actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.75k stars 1.12k forks source link

Init container logs excessively during copy command #3637

Open schniedergers opened 4 months ago

schniedergers commented 4 months ago

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

To Reproduce

1. Create an image that is based on ghcr.io/actions/actions-runner:latest and add `node` to it
2. Use that image as runner image in oci://ghcr.io/actions/actions-runner-controller-charts
3. Observe the logs from the init container. As `node` adds a lot of files, the verbose copy creates thousands of log lines each time a runner starts.

Describe the bug

Not really a bug, but it clogs up the log management (and increases costs)

Describe the expected behavior

Expected is that not every copied file is logged during the init container run.

Additional Context

Removing the `-v` from https://github.com/actions/actions-runner-controller/blob/80d848339e5eeaa6b2cda3c4a5393dfcb4614794/charts/gha-runner-scale-set/templates/_helpers.tpl#L90 reduces the logs significantly.

Controller Logs

-

Runner Pod Logs

Jul 03 15:26:58.148
i-00521ce9880b56851
at-gha-base
'/home/runner/externals/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_sm2.h' -> '/home/runner/tmpDir/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_sm2.h'

Jul 03 15:26:58.148
i-00521ce9880b56851
at-gha-base
'/home/runner/externals/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_rsa.h' -> '/home/runner/tmpDir/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_rsa.h'

Jul 03 15:26:58.148
i-00521ce9880b56851
at-gha-base
'/home/runner/externals/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_ecx.h' -> '/home/runner/tmpDir/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_ecx.h'

Jul 03 15:26:58.148
i-00521ce9880b56851
at-gha-base
'/home/runner/externals/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_ec.h' -> '/home/runner/tmpDir/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_ec.h'

Jul 03 15:26:58.148
i-00521ce9880b56851
at-gha-base
'/home/runner/externals/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_dsa.h' -> '/home/runner/tmpDir/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_dsa.h'

Jul 03 15:26:58.148
i-00521ce9880b56851
at-gha-base
'/home/runner/externals/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_digests.h' -> '/home/runner/tmpDir/./node20/include/node/openssl/archs/linux-x86_64/no-asm/providers/common/include/prov/der_digests.h'
github-actions[bot] commented 4 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

bowtie-ltsa commented 1 month ago

fyi - as a work-around we are just removing the "-v" from the "cp" command in the chart. Since we are specifying our own chart template anyway, this was pretty simple to do.

i.e. instead of command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"] we're doing command: ["/bin/bash", "-c", 'cp -r /home/runner/externals/. /home/runner/tmpDir/ || echo "cp: ERROR: exit code $?"; echo "cp: copied $(find /home/runner/tmpDir | wc -l)" directories and files']

more context:

gha-runner-scale-set:
  ## template is the PodSpec for each runner Pod
  ## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
  template:
    spec:
      # this init container copies files from the runner image to a volume that will be used by the dind container.
      # Standard runners use "cp -r -v" but under ARC and Loki that would log every file copied for every workflow
      # job in every org. (That's over 8000 lines of noise, per job per workflow per org.)
      # Instead, we omit the "-v" flag, echo the exit code if there is an error, and add a single-line summary.
      initContainers:
        - name: init-dind-externals
          image: artifactory.wu2.cloud.providence.org/docker-ghcr-remote/actions/actions-runner:latest
          command: ["/bin/bash", "-c", 'cp -r /home/runner/externals/. /home/runner/tmpDir/ || echo "cp: ERROR: exit code $?"; echo "cp: copied $(find /home/runner/tmpDir | wc -l)" directories and files']
          volumeMounts:
            - name: dind-externals
              mountPath: /home/runner/tmpDir
      containers:
        - name: runner
        # etc ...

ref: https://github.com/actions/actions-runner-controller/blob/96d1bbcf2fa961e7f64fad45ea8903b741cb3e16/charts/gha-runner-scale-set/values.yaml#L115-L121