actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.66k stars 1.1k forks source link

Using private ECR repos for actions #474

Closed lennartjuette closed 3 years ago

lennartjuette commented 3 years ago

Hey there,

I'm currently getting started with actions-runner-controller on a self-hosted instance of GitHub Enterprise. The actions I want to use need to be hosted internally, including any docerfiles that we may reference.

The problem is that I don't know how to make docker pick up the AWS credentials.

AFAIK there are roles in place which Pods in our Cluster can assume. So If I run aws aws ecr get-login-password | docker login --username AWS --password-stdin <MY-ECR> inside an action and then try to work with Docker, this works just fine. But how do I tell the docker container to pick up those credentials?

I'm not even sure which cointainer inside the runner Pod pulls the images for the actions. And even if I did: Any action I do will be executed after pulling all images and actions for all steps in a job, right? So the job will fail before I login.

What did I miss?

callum-tait-pbx commented 3 years ago

We are a AWS shop so with some more details I can tell you. Can you post an example workflow, it's not clear whether you are talking about the runner container or an arbritary container.

lennartjuette commented 3 years ago

Actually I found a solution that works for me, but I hope there's something more "out of the box".

The problem is not that the runners won't spawn, which are also using a copy of summerwind/actions-runner:v2.277.1-ubuntu-20.04 in from our own ECRs. But when an action defines a docker image for uses:, then that image cannot be pulled. Example:

Screenshot_2021-04-22_at_17_43_52

What works for me is the following:

Dockerfile

FROM summerwind/actions-runner:v2.277.1-ubuntu-20.04
RUN sudo apt update \
    && sudo apt install amazon-ecr-credential-helper --no-install-recommends -y \
    && sudo rm -rf /var/lib/apt/lists/*
ADD config.json /home/runner/.docker/config.json

config.json

{
    "credHelpers": {
        "<aws-id>.dkr.ecr.eu-central-1.amazonaws.com": "ecr-login"
    }
}

RunnerDeployment

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: emh-runner
  namespace: actions-runner
spec:
  template:
    spec:
      organization: <my-org>
      dockerMTU: 1200
      image: <aws-id>.dkr.ecr.eu-central-1.amazonaws.com/summerwind/actions-runner:v2.277.1-ubuntu-20.04-ecrhelper
      imagePullPolicy: Always
[...]

That way the steps are able to pull the required images without any further tweaking. Is there a nicer way to do this?

callum-tait-pbx commented 3 years ago

If I am understanding you correctly you are talking about using a arbritary container hosted on ECR in a workflow, docker-in-docker style i.e. the container key or via the shell?

If so then yes, you need to bake in that helper function (we do the same) as you need something to perform the auth steps automatically.

lennartjuette commented 3 years ago

Actually I'm using an action in a published Docker image, withe the tiny difference that the registry is private.

I noticed that using private images for actions is not mentioned in the GitHub docs.

But in theory it should be enough to install the AWS ECR credentials helper in the runner container image and tell people to mount a simple docker config from a configmap to enable pulling from private repos.

callum-tait-pbx commented 3 years ago

But in theory it should be enough to install the AWS ECR credentials helper in the runner container image and tell people to mount a simple docker config from a configmap to enable pulling from private repos.

You shouldn't even need to do this. We host the runner image on ECR and our EKS nodes are able to pull the image regardless of whether the helper function is part of the image or not. What is your environment, EKS or kops? Either way your node should have a role, is the ECR where your runner image is hosted in the same account and region as your k8s node?

lennartjuette commented 3 years ago

My Cluster is a CaaS provisioned by gardener. I don't know much more, sorry.

Pulling the runner image is not the problem. AFAIK the Nodes have a role applied to them, which allows them to pull from ECR. The problem is for the runner to pull docker images of actions from the exact same ECR. The point is that the docker inside the runner is not authenticated out of the box, but needs to use the amazon-ecr-credential-helper.

It makes sense to me, because at some point someone needs to pull the credentials from AWS at least once. When pulling the runner image kubelet does this, and inside said runner the amazon-ecr-credential-helper takes over the task.

Right?

callum-tait-pbx commented 3 years ago

The problem is for the runner to pull docker images of actions from the exact same ECR. The point is that the docker inside the runner is not authenticated out of the box, but needs to use the amazon-ecr-credential-helper.

Ah it wasn't super clear if that was what you meant, that's why I asked for a workflow file as then it is clear and explicit what is going on, no interpretation needed.

I would think that as long as IRSA configured correctly then the ecr login action before any docker pull commands would also work:

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

and if your ECR is on another account to your cluster then assuming a new role before performing a login will work:

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
          aws-region: eu-west-1
          role-to-assume: arn:aws:iam::$ACCOUNT_ID:role/$ROLE_WITH_ECR_PERMS
          role-duration-seconds: 900
    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

If I am understanding you correctly you are talking about using a arbritary container hosted on ECR in a workflow, docker-in-docker style i.e. the container key or via the shell?

That said including the helper function in your runner image will mean you can use the container key so you probably want to include it regardless

lennartjuette commented 3 years ago

The workflow will try to build (in case of scripts/repos) or pull (in case of images) first. This means any action, that may authenticate the runner against the ECR will be called after the first action docker image will be pulled. That wont work.

Thanks for the effort you put into answering me. I found a solution that works for me right now. I'll report back if I ever find a better solution.

mumoshu commented 3 years ago

Hey! Thank you for sharing your experience.

I can see this seems to be working as expected, but requiring users to build their own custom runner images just to add the default auth creds seems a bit of bother.

What if we enhanced our entrypoint.sh to programmatically run some arbitrary script, so that you can basically run:

sudo apt update \
    && sudo apt install amazon-ecr-credential-helper --no-install-recommends -y \
    && sudo rm -rf /var/lib/apt/lists/*

# /some/vol should be a config volume mount. It's configurable via runner spec already
cp /some/vol/config.json /home/runner/.docker/config.json

before the entrypoint.sh starts the agent

lennartjuette commented 3 years ago

Ahoy!

I was actually thinking about adding the credentials helper to the image (install in Dockerfile) to save us the installation each time a new pod is stwrted.

Mounting the config is possible already as you mentioned. But my config is r/o, which may be problematic in some cases. I guess the enhancement you described would be helpful!

callum-tait-pbx commented 3 years ago

What if we enhanced our entrypoint.sh to programmatically run some arbitrary script

Sounds like it would open a can of support worms down the road

https://github.com/awslabs/amazon-ecr-credential-helper

There is a aws provided solution which aws maintain and support. If I was a maintainer I personally would look for people to make use of this before considering supporting a custom config here 🤷

mumoshu commented 3 years ago

Sounds like it would open a can of support worms down the road

@callum-tait-pbx Yeah true... that would be a nightmare from maintainers and contributors perspective. Then, can we instead add an explicit note in our README about when you need to build/use custom images?

callum-tait-pbx commented 3 years ago

Yeh, it might be worth adding something around that. To be honest I think the project would really benefit from producing a bit of a roadmap in terms of the next set of features / enhancements that will be developed (not so much timeframes, this is an open source best endenvours project after all).

There is a bit of a backlog of some really good features now, it might be worth considering a priority order. I don't know if you've seen https://github.com/summerwind/actions-runner-controller/issues/443 ?

lennartjuette commented 3 years ago

Adding the amazon-ecr-credential-helper to the runner images would be a start. Mounting your own config via existing mechanisms shouldn't be such a pain for the users.

mumoshu commented 3 years ago

@callum-tait-pbx Ah sry I read #443 but forgot to call @summerwind for enabling the discussions! Yeah, having a roadmap would be nice, and having the place to discuss about our roadmap would be great.

callum-tait-pbx commented 3 years ago

It's up to the maintainers but this project isn't targeting AWS specifically so I don't think starting to bundle AWS specific tooling into the provided runners make sense, the end user should be producing their own images for their environment.

If actions-runner-controller goes down the path of starting to bake in AWS tools then why not GCP tools too or Azure tools? Once the tools are added into the base runner image then the project will need to support them when users raise issues as the expectation has been set, bump versions of the tools as new releases happen etc

lennartjuette commented 3 years ago

Good point.

But on the other side it'll make working with the runners more cumbersome, because everyone has to bake their own images. You could define which clouds you support, document it and stick with it. If someone needs newer tools, they can provide a PR.

But I get your point. In the end most people will most likely complain but not contribute.

mumoshu commented 3 years ago

@lennartjuette I hear you. To be clear, we are not yet committed to add any environment-specific packages/helpers/configs/etc to the default runner image yet. I admit it isn't very user-friendly though.

I personally think the best strategy here would be that someone creates e.g. an AWS-specific distribution of actions-runner-controller that includes the custom runner image containing amazon-ecr-credential-helper, so that we can focus on building the solid foundation, while the other can focus on making it super easy for AWS users. But not sure if that's feasible.

On the other hand, adding some general guidance, links, information, etc. to our README about potential solutions to common usecases would be great and relatively easy for us all, and it still helps you leverage this project, I think.

lennartjuette commented 3 years ago

I'll close this issue for now. Feel free to take over my examples into your documentation.

For the sake of completeness, where's sample workflow that demonstrates how I use the images, to avoid the confusion I created for @callum-tait-pbx (pulling runner image vs. pulling action image):

---
name: CI

on:
  push:

jobs:
  lint:
    runs-on: [self-hosted]
    steps:
      - uses: actions/checkout@v2
      - name: Lint Markdown files
        uses: docker://<ECR-ID>.dkr.ecr.eu-central-1.amazonaws.com/avtodev/markdown-lint:v1
      - name: 'Yamllint'
        uses: docker://<ECR-ID>.dkr.ecr.eu-central-1.amazonaws.com/karancode/yamllint-github-action:dd59165
[...]
aleskiontherun commented 2 years ago

the end user should be producing their own images for their environment

That is reasonable, until you realize that in this particular case it becomes a chicken-egg problem: to be able to build and push images in your cluster, you need to build and push an image somewhere else first. So you have to solve one problem in two completely different ways, which is quite frustrating, especially considering how much easier it would be to allow built-in support for specific cloud providers, e.g. in separate images extending the core image. I'll be happy to contribute if you change your mind about it.

mumoshu commented 2 years ago

@dizeee Thanks for the feedback! I do understand the bootstrapping problem is always hard, and do believe it would be great if we could provide a more streamlined way to bootstrap your self-hosted runners environment on AWS...

That said, what might be the next step given we already have three variants of the runner image? https://github.com/actions-runner-controller/actions-runner-controller/tree/master/runner We're even going to multiply the number of images due to https://github.com/actions-runner-controller/actions-runner-controller/pull/1688.

Will you be choosing a base image from the "rootless/rootful x dind within/outside runner" combinations, adding the credential helper to it, and publishing it as another runner image dedicated to self-hosted runners on AWS?