Add custom runners from cluster for GitHub actions

sylus commented 4 years ago

Just adding this so is documented but at the moment some of our CI runs on larger images have hit quotas.

Talking with Microsoft and the meeting with David it seems there are things we can can do to leverage internal runners without security concerns?

https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories

@zachomedia was at the meeting and mentioned this and our security concerns might have mitigations somehow?

If that is the case this would be great as I'd really like to ensure we have Trivy running on all containers and at least integrate some level of scanning for all public hosted images.

justbert commented 4 years ago

We could always spin up a side cluster just for the runners? It would isolate the workloads and prevent shenanigans.

sylus commented 4 years ago

Hello @aronchick, I'm following up from the conversation you had with us at Statistics Canada. Firstly, thanks again for your time on that, it was greatly appreciated!

I wanted to chat briefly about self-hosted GitHub Actions runners, our concern is the security notice posted here: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories

Forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

Could you elaborate on this / let us know if there is a way to prevent this situation?

Thanks in advance!

aronchick commented 4 years ago

Hi!

So the net of this is it’s just about if an opponent can get you to execute code on your runners (in your environment). In this case, the reason not to use self hosted runners is if:

They can get you to pull in their code
Their code has obfuscated an attack somewhere
During the building/running of your workflows, it executes this code

If that’s done on your environment, I think you can see how it might be able to access potentially private information.

Does this make sense? You COULD do self-hosted runners if the self-hosted runners were isolated from anything else in your environment – just treat the runners as though they were external devices and that would be good isolation.

sylus commented 4 years ago

Thanks @aronchick that perfectly summated it appreciate you taking the time :D

aronchick commented 4 years ago

No worries! Please let me know if I can help any further.

BTW, looking at the thread, this is exactly what @justbert had suggested earlier ;)

justbert commented 3 years ago

This is the start to identifying requirements for the self-hosted runners:

Security

As discussed above, the security of the environment is paramount. Due to the fact that GitHub Actions is code coming from an "outside source" we will need to ensure proper isolation.

To achieve this, a separate nodepool isolated from the rest should be the least amount of isolation put in place.

Traffic Flows

Runners to Artifactory
Runners to API Server
Runners to Kubeflow?
GitHub to runners? (potentially via Webhook)

Tooling Surface

The virtual-environments defined for GitHub actions are massive. These pull in large amounts of tooling which can increase the surface area of potential security threats. If upstream images are used, they should be vetted, otherwise, it may be useful to create images which meet the needs of the AAW CI.

Istio

Will it work behind Istio? Are any of Istio's features required. And access via the IngressGateway may be required if the WebHook is used.

Container requirements

Will these images or any sidecars require privileged capabilities?
Can the Docker In Docker Rootless be used?
- Should a central server be used for Docker builds to ensure security?

Docker-based actions may require root: https://docs.github.com/en/actions/creating-actions/dockerfile-support-for-github-actions#user

Maintainability

GitHub Actions runs in pre-determined virtual-environments that are continuously updated. These are large toolboxes with a very large amount of tooling.

Custom images would need to be built or upstream solutions will need to be identified and vetted. This adds the burdens of development, maintenance, and vigilance onto us.

Who responsibilities is it to maintain and operate the runners?

NOTE: Since these Runners are meant to be for AAW, specifically, repository-scoped Runners will need to be used. This could mean a good amount of idling compute if we aren't able to consolidate the runners or if we aren't able to scale them. This may also increase required maintenance.

API Limits

It looks as though API limits may also be an issue that could be encountered when setting up this infrastructure. This will need to be further assessed on impact.

Secrets Integration

Due to the fact that we will be controlling our own CI infrastructure, it would be important to make the most of it. Providing secrets for access to internal systems would best be handled at this layer as it would prevent the need for them to be stored in GitHub and will be entirely within our cloud.

Use of ServiceAccounts for API server access
Use of Vault to mount secrets for Artifactory

Installation and Management Options

There are 3 options that I have currently found:

Custom Image and Chart

This solution involves creating or vetting an image and chart to deploy. It would be best to be able to tailor a reuseable solution so that multiple runners could be launched easily. It would also be wise to ensure that these images are updated and scanned regularly.

This is option may add a lot of maintenance depending on the amount of runners that need to be supported.

actions-runner-controller

The actions-runner-controller is a project currently being developed. It is still early in its development, however, it does offer quite a few features as well as a community from which to gain information.

It offers:

Runners at different levels: repository, organization, enterprise
Curated images that align with current virtual-environments
Scaling of runners via different mechanisms

github-actions-runner-operator

github-actions-runner-operator offers another implementation that offers a simple way to Cloud Natively define Runners.

justbert commented 3 years ago

https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions#hardening-for-self-hosted-runners

YannCoderre commented 2 years ago

Can we close this as we are using ArgoCD?

blairdrummond commented 2 years ago

@brendangadd should this be closed?

sylus commented 2 years ago

This should be closed since we are now using ArgoCD

StatCan / aaw