actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.68k stars 1.11k forks source link

How to Customize Runner Environment? #1135

Open Fleshgrinder opened 2 years ago

Fleshgrinder commented 2 years ago

We are extending the runner base image to provision additional software similar to what GitHub does in their virtual environments. These provisioners also need to set up additional environment variables dynamically since they install software with dynamic versions. The natural choice to do so is by adding those environment variables to /etc/environment. However, this file is never loaded by the runners.

Hence, we set out to find out what is going on, and it seems that there is no reliable way to actually do this, other than manipulating files of the base image. This is obviously brittle, since those files can change at any time and are not standardized like the files of the operating system (e.g. /etc/environment). Let me explain what we found.

The entrypoint is set to /usr/local/bin/dumb-init and the initial command is set to /entrypoint.sh. The /entrypoint.sh script copies all non-dot files from /runnertmp to /runner.^cpr Later it replaces the standard runsvc.sh with a pachted runsvc.sh.^patched This file contains the following lines:

https://github.com/actions-runner-controller/actions-runner-controller/blob/0b9bef2c086a54b0731fa7aa0a6bd7cb856b48ee/runner/patched/runsvc.sh#L7-L13

The .path file can never exist (unless it would be created by some of the commands that are being called in /entrypoint.sh) since dot files are not copied over to /runner. The line that states that this is the place to customize the environment is for sure well intended, but assumes that editing alien scripts is done by a human, which is not the case for us. Customizing it with automation is – as stated before – a bad idea since it may break a lot of things at any point in time.

Imho it would be best to simply read /etc/environment since that is what it is intended for, but maybe there are good reasons not to do so?

mumoshu commented 2 years ago

@Fleshgrinder Hey!

The .path file

The whole runsvc.sh is mostly a copy of that from upstream: https://github.com/actions/runner/blob/main/src/Misc/layoutbin/runsvc.sh

so

The line that states that this is the place to customize the environment is for sure well intended

Maybe, but to be clear that's designed by GitHub, for by us, so we might better think about own use-case independently.

Fleshgrinder commented 2 years ago

Sure thing, I was only running down the rabbit hole. My expectation was that /entrypoint.sh provides the customization functionality, but it does not provide anything either.

toast-gear commented 2 years ago

If all you're trying to do is have some environment variables baked into your runner and loaded by the serivce you just need a .env file at the root of the service path in the format of key=value. The runner doesn't load other env files specifically so you have a deterministic env.

mumoshu commented 2 years ago

/etc/environment

I'm not entirely sure if that's the single right way for us. Does every operating system supports it, and does it make sense within the context of containers? 🤔

Also, regardless of if /etc/environment is right or not, I'd also like to keep it simple so that I can maintain it in our limited time and resource. While I'd like to help as many folks as an open-source enthusiast, adding anything that complicates the maintenance and user support in the exchange for little benefit should be avoided.

One idea- Can you build a custom runner image, renaming the standard entrypoint.sh into another name, include a custom entrpoint.sh at /entrypoint.sh and do whatever you want there? The custom entrypoint.sh could exec the original entrypoint.sh so that it works like the original entrpoint.sh in the end.

Or, what if we enhanced our entrypoint.sh to automatically source some bash scripts under a predefined path, so that you could add your own script that reads /etc/environment and export every environment defined in it, or something like that? Would it work for your use-case?

mumoshu commented 2 years ago

Ah, never mind if .env works. It's just that I wasn't aware of it!

Fleshgrinder commented 2 years ago

@toast-gear that is the goal, yes. What is the “root of the service path?” Hopefully not /runner since .env would not be copied over from /runnertmp. 😐

@mumoshu anything works, /etc/environment is just a standard way of doing this in some operating systems and I picked it as a possible solution, but it was not meant to be the only solution.

Changing the CMD from /entrypoint.sh to something else would be my workaround, but it would be great to have a built-in functionality that does not require the addition of such drastic measures. ☺️

mumoshu commented 2 years ago

@Fleshgrinder Thanks!

Just confirming, but will my second idea work?

Or, what if we enhanced our entrypoint.sh to automatically source some bash scripts under a predefined path, so that you could add your own script that reads /etc/environment and export every environment defined in it, or something like that? Would it work for your use-case?

toast-gear commented 2 years ago

Imo adjusting https://github.com/actions-runner-controller/actions-runner-controller/blob/1b911749a65ed6b767c2320c7eba9f34b53408e2/runner/entrypoint.sh#L83 to also copy a .env file and not error if there isn't one is probably the least confusing least hacky fix. We aren't changing expected actions/runner behaviour then

Fleshgrinder commented 2 years ago

To generally copy dot files from /runnertmp we could simply tell Bash to do so:

shopt -s dotglob
cp -r /runnertmp/* "$RUNNER_HOME/"
shopt -u dotglob

Additionally it is probably a good idea to directly change it further to actually use the environment variable that defines the /runnertmp path. Otherwise having that variable is confusing since it does not actually affect anything.

shopt -s dotglob
cp -r "$RUNNER_ASSETS_DIR"/* "$RUNNER_HOME"/
shopt -u dotglob
toast-gear commented 2 years ago

Sounds reasonable to me, give it a try and raise a PR if it works. Either way let us know your results.

Fleshgrinder commented 2 years ago

What I can already report back is that the content of /etc/environment is available whenever sudo is used, but not for the normal runner user. This is because sudo honors the standard loading of environment variables in accordance with the operating system standards. I get the feeling that the real problem is with LoadAndSetEnv of https://github.com/actions/runner since it should honor that as well.

We should still add the dotfile copying, since it is the current way it should work, but I am going to create a ticket with GitHub to clarify this.

Fleshgrinder commented 2 years ago

Another thing to report is that the PATH is not set correctly for the root user, as it contains the .local/bin directory from the runner user:

$ env | grep -F PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/runner/.local/bin
$ sudo env | grep -F PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/runner/.local/bin

The :/home/runner/.local/bin should not be part of the root’s PATH for obvious reasons. The reason for this seems to be here:

https://github.com/actions-runner-controller/actions-runner-controller/blob/7156ce040ec65dd80863f390f8397dbf01c8466d/runner/Dockerfile#L116-L117

Although I am not entirely sure, because ${HOME} for root should not point to /home/runner but rather /root. Plus, the DEBIAN_FRONTEND=noninteractive environment variable is also set in the Dockerfile but only visible in the runner environment (not in the Dockerfile RUNs themselves, because the user is switched later, but in the final image):

https://github.com/actions-runner-controller/actions-runner-controller/blob/7156ce040ec65dd80863f390f8397dbf01c8466d/runner/Dockerfile#L11

env

I assume it has something to do with the .path file that is created somewhere by something, could not pinpoint its source yet. Maybe you have an idea?

path

There definitely is this line that loads the .path variable and exports it, but how this can become the path for root is beyond me, because sudo should not inherit it.

https://github.com/actions-runner-controller/actions-runner-controller/blob/7156ce040ec65dd80863f390f8397dbf01c8466d/runner/patched/runsvc.sh#L7-L11

Fleshgrinder commented 2 years ago

I extended https://github.com/actions-runner-controller/actions-runner-controller/pull/1136 with loading /etc/environment the same way as PAM does in Ubuntu for all users. Since that is the behavior that we have in GitHub’s virtual-environments, see https://github.com/actions/runner/issues/1703#issuecomment-1050292326 and the following comments for more context.

I could not resolve the PATH issue yet, will dig deeper. That said, we could merge the pull request as-is and follow up with additional pull requests later to fix the other environment issues.

Fleshgrinder commented 2 years ago

Another issue with the environment is that all the ARC specific environment variables are leaking into the environment of the users:

$ printenv
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT=tcp://1.1.1.1:33080
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_33080_TCP: tcp://1.1.1.1:33080
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_33080_TCP_ADDR: 1.1.1.1
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_33080_TCP_PORT: 33080
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_33080_TCP_PROTO: tcp
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_8443_TCP: tcp://1.1.1.1:8443
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_8443_TCP_ADDR: 1.1.1.1
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_8443_TCP_PORT: 8443
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_PORT_8443_TCP_PROTO: tcp
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_SERVICE_HOST=1.1.1.1
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_SERVICE_PORT=33080
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_SERVICE_PORT_HTTP=33080
ACTIONS_RUNNER_CONTROLLER_GITHUB_WEBHOOK_SERVER_SERVICE_PORT_METRICS_PORT=8443
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_PORT=tcp://1.1.1.1:8443
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_PORT_8443_TCP: tcp://1.1.1.1:8443
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_PORT_8443_TCP_ADDR: 1.1.1.1
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_PORT_8443_TCP_PORT: 8443
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_PORT_8443_TCP_PROTO: tcp
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_SERVICE_HOST=1.1.1.1
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_SERVICE_PORT=8443
ACTIONS_RUNNER_CONTROLLER_METRICS_SERVICE_SERVICE_PORT_METRICS_PORT=8443
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_PORT=tcp://1.1.1.1:443
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_PORT_443_TCP: tcp://1.1.1.1:443
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_PORT_443_TCP_ADDR: 1.1.1.1
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_PORT_443_TCP_PORT: 443
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_PORT_443_TCP_PROTO: tcp
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_SERVICE_HOST=1.1.1.1
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_SERVICE_PORT=443
ACTIONS_RUNNER_CONTROLLER_WEBHOOK_SERVICE_PORT_HTTPS=443
AWS_DEFAULT_REGION=redacted
AWS_REGION=redacted
AWS_ROLE_ARN=arn:aws:iam::redacted:role/redacted
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
CI=true
CONTROLLER_MANAGER_METRICS_SERVICE_PORT=tcp://1.1.1.1:8443
CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP: tcp://1.1.1.1:8443
CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP_ADDR: 1.1.1.1
CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP_PORT: 8443
CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP_PROTO: tcp
CONTROLLER_MANAGER_METRICS_SERVICE_SERVICE_HOST=1.1.1.1
CONTROLLER_MANAGER_METRICS_SERVICE_SERVICE_PORT=8443
CONTROLLER_MANAGER_METRICS_SERVICE_SERVICE_PORT_HTTPS=8443
DEBIAN_FRONTEND=noninteractive
DOCKERD_IN_RUNNER=false
DOCKER_CERT_PATH=/certs/client
DOCKER_CONFIG=/etc/docker
DOCKER_HOST=tcp://localhost:2376
DOCKER_TLS_VERIFY=1
GH_NO_UPDATE_NOTIFIER=true
GITHUB_ACTION=__run_2
GITHUB_ACTIONS=true
GITHUB_ACTION_REF=
GITHUB_ACTION_REPOSITORY=
GITHUB_ACTOR=redacted
GITHUB_API_URL=https://api.github.com/
GITHUB_BASE_REF=redacted
GITHUB_ENV=/runner/_work/_temp/_runner_file_commands/set_env_00000000-0000-0000-0000-000000000000
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_PATH=/runner/_work/_temp/_github_workflow/event.json
GITHUB_GRAPHQL_URL=https://api.github.com/graphql
GITHUB_HEAD_REF=redacted
GITHUB_JOB=test
GITHUB_PATH=/runner/_work/_temp/_runner_file_commands/add_path_00000000-0000-0000-0000-000000000000
GITHUB_REF=refs/pull/0/merge
GITHUB_REF_NAME=0/merge
GITHUB_REF_PROTECTED=false
GITHUB_REF_TYPE=branch
GITHUB_REPOSITORY=redacted/redacted
GITHUB_REPOSITORY_OWNER=redacted
GITHUB_RETENTION_DAYS=90
GITHUB_RUN_ATTEMPT=0
GITHUB_RUN_ID=0
GITHUB_RUN_NUMBER=0
GITHUB_SERVER_URL=https://github.com/
GITHUB_SHA=0000000000000000000000000000000000000000
GITHUB_URL=https://github.com/
GITHUB_WORKFLOW=redacted
GITHUB_WORKSPACE=/runner/_work/redacted/redacted
HOME=/home/runner
HOSTNAME=redacted
ImageOS=ubuntu20
JAVA_HOME_11_X64: /usr/lib/jvm/temurin-11-jdk-amd64
JAVA_HOME_17_X64: /usr/lib/jvm/temurin-17-jdk-amd64
JAVA_HOME_8_X64: /usr/lib/jvm/temurin-8-jdk-amd64
KUBERNETES_PORT=tcp://1.1.1.1:443
KUBERNETES_PORT_443_TCP: tcp://1.1.1.1:443
KUBERNETES_PORT_443_TCP_ADDR: 1.1.1.1
KUBERNETES_PORT_443_TCP_PORT: 443
KUBERNETES_PORT_443_TCP_PROTO: tcp
KUBERNETES_SERVICE_HOST=1.1.1.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
OLDPWD=/
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/runner/.local/bin
PWD=/runner/_work/redacted/redacted
RUNNER_ARCH=X64
RUNNER_ASSETS_DIR=/runnertmp
RUNNER_ENTERPRISE=
RUNNER_EPHEMERAL=true
RUNNER_FEATURE_FLAG_EPHEMERAL=true
RUNNER_GROUP=
RUNNER_LABELS=redacted
RUNNER_NAME=redacted
RUNNER_ORG=redacted
RUNNER_OS=Linux
RUNNER_TEMP=/runner/_work/_temp
RUNNER_TOOL_CACHE=/opt/hostedtoolcache
RUNNER_TRACKING_ID=github_00000000-0000-0000-0000-000000000000
RUNNER_WORKDIR=/runner/_work
RUNNER_WORKSPACE=/runner/_work/redacted
SHLVL=0
STARTUP_DELAY_IN_SECONDS=3
WEBHOOK_SERVICE_PORT=tcp://1.1.1.1:443
WEBHOOK_SERVICE_PORT_443_TCP: tcp://1.1.1.1:443
WEBHOOK_SERVICE_PORT_443_TCP_ADDR: 1.1.1.1
WEBHOOK_SERVICE_PORT_443_TCP_PORT: 443
WEBHOOK_SERVICE_PORT_443_TCP_PROTO: tcp
WEBHOOK_SERVICE_SERVICE_HOST=1.1.1.1
WEBHOOK_SERVICE_SERVICE_PORT=443
_=/usr/bin/env

I am not entirely sure which environment variables are the result of our doing (probably the AWS ones) and which ones are a result of ARC. I guess all the ACTIONS_RUNNER_CONTROLLER_*, CONTROLLER_MANAGER_*, KUBERNETES_*, RUNNER_*, STARTUP_DELAY_IN_SECONDS, and WEBHOOK_SERVICE_* are from ARC and should not be there.

mumoshu commented 2 years ago

ACTIONS_RUNNERCONTROLLER, CONTROLLERMANAGER, KUBERNETES*, RUNNER, STARTUP_DELAY_IN_SECONDS, and WEBHOOKSERVICE are from ARC and should not be there.

@Fleshgrinder Everything listed here, except RUNNER_*, are populated by K8s and should be available to any containers in a K8s pod. Those are not ARC-specific and useful for standard K8s use-cases like using envvars for service-discovery. Perhaps the default behavior should be to expose those envvars, although it can be a good idea to make it togglable via a feature flag to the runner container 🤔

RUNNER_* are populated by ARC and should be used only from within runner's entrypoint.sh. Perhaps we can unset(or is there a better way?) those envvars as shouldn't be visible to jobs?

Fleshgrinder commented 2 years ago

Does anyone really desire workflows that are aware of being executed as part of ARC? I mean, it makes the workflows unportable. A toggle would be a good idea here.

unset is possible. The issue I see in general with environment variables as we are using them is that users might set environment variables with the same names as part of their workflow. This does not interfere with the runners, since user environment variables are set up later, but if a software within a workflow also has an environment variable that accidentally matches one of the more generic ARC environment variables it can lead to weird behavior. Obscuring the environment variables is a possibility here:

This would allow us to clearly state that no environment variables starting with __ARC_ are allowed, and within entrypoint.sh we could have:

while read -r var; do
  if [[ "$var" == __ARC_* ]]; then
    unset "${var%%=*}"
  fi
done < <(env)

Even better would be to pass things by arguments to the runners. This way there is nothing to unset, nothing to leak, and it is thus much safer in general.

mumoshu commented 2 years ago

environment variables with the same names as part of their workflow

Everything we explicitly use in entrypoint.sh should not be leaked to workflows so I thought you can just unset every known RUNNER_* vars and other runner-specific envvars we explicitly pass down from the controller, before finally exec'ing runsvc.sh, in our entrypoint.sh.

https://github.com/actions-runner-controller/actions-runner-controller/blob/e7200f274d592729b46848218c1d9c54214065c9/runner/entrypoint.sh#L172

Why do you need to prefix the envvars with __ARC_? 🤔 Your explanation makes sense, but as you've written you intended to obfuscure our own envvars like DOCKERD_IN_RUNNER, STARTUP_DELAY_IN_SECONDS, and other RUNNER_* envvars, right? That seems possible without __ARC_. Does it have any concrete benefit other than its being very elegant?

mumoshu commented 2 years ago

Does anyone really desire workflows that are aware of being executed as part of ARC? I mean, it makes the workflows unportable. A toggle would be a good idea here.

This is about K8s-populated envvars, right? I'd hope any k8s pod containers to have standard k8s-populated envvars available to you, because, that's the portability K8s provides. If you weigh more on the portability of workflows, yes, your idea makes sense a lot.

A toggle would be nice, if we see real issues with those K8s-populated envvars.

Fleshgrinder commented 2 years ago

Main reason for the __ARC_ really is about the ability to reliably unset them without having anyone remember that they need to edit entrypoint.sh. It is good practice to have a common prefix for environment variables in general. We could of course also use RUNNER_ as that prefix and achieve the same. (I originally thought that users can interact with the environment variables through their workflows, but disproved this by trying it. So there is no real reason to obfuscate.)

Well ACTIONS_RUNNER_CONTROLLER_* and CONTROLLER_MANAGER_METRICS_SERVICE_* do not look like standard Kubernetes to me, rather specific to ARC. Anyway, a GHA workflow should not know whether it is running on Kubernetes, or not. Well, generally speaking, since there are parts where it truly makes a difference (e.g. a lot of our authentication is done automagically through AWS and Kubernetes stuff, this would obviously break if we were to execute it anywhere other than on our ARC setup).

I have no strong feelings here regarding anything. I see positives and negatives, but we should definitely clean the environment as much as possible to ensure that we are not leaking things that truly are of no use (those you mentioned). Having a common prefix would allow us to handle them generically. As it stands right now we could have the following:

unset DOCKERD_IN_RUNNER STARTUP_DELAY_IN_SECONDS
while read -r var; do
  if [[ "$var" == RUNNER_* ]]; then
    unset "${var%%=*}"
  fi
done < <(env)

Note that we can change the environment variables without breaking anything:

#!/usr/bin/env bash
set -Eeuo pipefail

# region backcomp
export RUNNER_DOCKERD_IN_RUNNER=${DOCKERD_IN_RUNNER:-false}
unset DOCKERD_IN_RUNNER

export RUNNER_STARTUP_DELAY_IN_SECONDS=${STARTUP_DELAY_IN_SECONDS:-0}
unset STARTUP_DELAY_IN_SECONDS
# endregion backcomp

# ... snip ...

while read -r var; do
  if [[ "$var" == RUNNER_* ]]; then
    unset "${var%%=*}"
  fi
done < <(env)

With this we could give users enough time to change before we remove the backcomp code.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Fleshgrinder commented 2 years ago

There still is the issue with the PATH that is somehow polluted for the root user. I didn't find out yet why that is.

davidsielert commented 2 years ago

If all you're trying to do is have some environment variables baked into your runner and loaded by the serivce you just need a .env file at the root of the service path in the format of key=value. The runner doesn't load other env files specifically so you have a deterministic env.

Can you please clarify this ? I've tried putting a .env file at / and /runner to no avail .

Edit: figured it out , its /runnertmp/.env

remover commented 1 year ago

/runnertmp/.env

is there any documentation on this? how can we ensure this file exists? i'm using helm + terraform to deploy this on GKE...

kevholmes commented 1 year ago

/runnertmp/.env

is there any documentation on this? how can we ensure this file exists? i'm using helm + terraform to deploy this on GKE...

AFAIK, no. You could use an init container if they're simple, or use the existing runner images and build off of them, COPY'ing in your .env file as a final step in your own Dockerfiles. We have forked the full set of runner dockerfiles/associated files into our own source control and modify them directly to have full control over the runners and how they're built and what's included.

sandesh2026 commented 3 days ago

In runner directory. Create a .env file and add PATH=/usr/bin:/bin:/sbin:/usr/sbin:/usr/local/bin Add other paths also if required And save the file. Then restart the runner service. This worked for us!