actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.72k stars 1.11k forks source link

Reconsider Aligning `actions-runner` software with Github-hosted software #2386

Open scruplelesswizard opened 1 year ago

scruplelesswizard commented 1 year ago

What would you like added?

I would like the existing images without Cloud-specific tooling to be appended with i.e. -slim or -cloudless , and for the default actions-runner images to have the same installed software as their related actions/runner-images images

Why is this needed?

Now that Actions Runner Controller has been officially adopted into the actions/ org I think it is worth reconsidering alignment of pre-installed software with actions/runner-images.

ARC's end users are often unaware that ARC is being used and expect consistent pre-installed software when developing their workflows. When ARC runners are provided to supplement or replace Github hosted runners errors are likely to occur due to the delta between Github-hosted runners and ARC-hosted runners.

While some end-user friction can be mitigated by an organizations ARC operators they may not have insight into all tooling used across the org, leading to increased friction for ARC adoption. Sourcing the same software as actions/runner-images reduces that friction significantly.

Additional context

Installed Software for actions/runner-images

ubuntu-latest or ubuntu-22.04 ubuntu-20.04

I'm happy to help with the lift on this, but I wanted to float it in the community first for feedback

cwoodcox commented 1 year ago

While I agree that the parity would be nice, just adding the az CLI for our runners increased the image size to over 2.1GB. While that is probably an outlier, I think that installing all of that tooling would lead to the image growing to an unsustainable and unusable size.

scruplelesswizard commented 1 year ago

While I agree that the parity would be nice, just adding the az CLI for our runners increased the image size to over 2.1GB. While that is probably an outlier, I think that installing all of that tooling would lead to the image growing to an unsustainable and unusable size.

I absolutely agree that offering a small, trimmed-down image that can be used or customized as needed is very important, as is keeping container images as small as possible.

At the same time parity with the Github-hosted runners is a common expectation by many teams adopting ARC, and is a high-friction point for adoption. Providing Github-hosted parity images as a default option, while calling out the disadvantages in our documentation would be a great way for teams to easily get started with ARC, then optimize for their use-cases.

As an alternative we could prioritize keeping images small while offering some compatibility. For example, we could offer a matrixed set of images, based on Cloud and Language. This strategy would likely require a fairly significant CI lift to implement, and mean a much wider set of images to maintain. It would require ARC users to create many different runner sets if they are using multiple languages, which doesn't offer the low-friction adoption users expect.

It's a matter of enabling ARC adopters to "make it work, then make it good"

mumoshu commented 1 year ago

Hey! Thanks for the detailed feedback. While I generally agree that the "make it work, then make it good" way of getting started with self-hosted runners is great for usability, I'm unsure if that's really what everyone wants if we implement it naively. I'd love some design discussion first.

My biggest concern is that when I last checked, the full runner image could be larger than 10GB. Defaulting to an image of this size wouldn't be an option until Kubernetes and its cloud offerings have sane support for somehow distributing/prepopulating/warming up the container image so that runner pods can still come up in several seconds, not a minute or two or more(depending on where your runner pod is going to be hosted... A raspi in a home network?)

toast-gear commented 1 year ago

At the same time parity with the Github-hosted runners is a common expectation by many teams adopting ARC, and is a high-friction point for adoption.

I've always assumed the full docker images provided by the https://github.com/nektos/act project were an accurate representation of the size to expect with a runner container with software parity with GitHub's runners. https://hub.docker.com/layers/catthehacker/ubuntu/full-20.04/images/sha256-598b616a8c7ce86d98ee63871cec532f4ff645125b563a8798f2ae1c98928ec7?context=explore. ~14GB images are far far far too big to be a default image imo, you'll just be trading 1 friction point for another.

As an alternative we could prioritize keeping images small while offering some compatibility. For example, we could offer a matrixed set of images, based on Cloud and Language. This strategy would likely require a fairly significant CI lift to implement, and mean a much wider set of images to maintain.

It's really a question for GitHub to ask and answer internally. There's a middle ground between as slim as possible and parity with their virtual environments but with it comes increased overheads maintaining the runner images. Only GitHub really can say whether that is something they are willing to take on or not.

scruplelesswizard commented 1 year ago

tl;dr; It seems more probable that an ARC operator will be aware of our image sizes than the end-user will be of the runner implementation.

~14GB images are far far far too big to be a default image imo, you'll just be trading 1 friction point for another.

Image size is a valid concern. We offer a slim base runner image that can be filled with whatever tools are need. That is a great option for many of our users.

Having a default runner image of ~15GiB would require significantly more network and storage resources, and would also require maintaining the pre-installed tools. The networking and storage for ARC distribution are already offered and managed by Github, with the exception of a few legacy sources. Much of the work for maintenance already exists within the actions ecosystem, but it would require some coordinated effort to incorporate into ARC.

The most impact would be on ARCs users due to the size of the container image. However, this would only impact new users adopting ARC. Existing ARC users already have adopted the slim runner image directly, or use it as a base for their custom runner images.

It's worth considering that in some organzations ARC's operators may be unaware of the tools needed by the end-users. What could happen if an operator deploys ARC today with an expectation of parity? What options does an operator have if they are unable build and maintain container images and end-user packages?

My biggest concern is ... that runner pods can still come up in several seconds

Might an end-user perfer to have a runner come up slowly and behave as expected instead?

A workflow executed on an ARC runner is relatively indestinguishable workflow executed on a Github-hosted runner, other than the pre-installed tools. If a workflow's user is unaware of ARCs implementation and the preinstalled tools were missing, what might their experience be like? What if their organization allows use of both Github-hosted and ARC-hosted runners?

A few things I can think of that might improve the day-0 experience:

mumoshu commented 1 year ago

Thanks, @toast-gear and @chaosaffe! As of today, I'd agree with the first option @chaosaffe mentioned.

So I'd agree if we do this:

and we don't do the following:

Providing and recommending the use of 14GB container images would end up giving the wrong expectation to most users. They'll start and keep complaining about why runners won't come up fast, and our answers might be just "make your network/storage fast or introduce your own P2P image distribution mechanism or a node warm-up solution" which isn't practically easy.

chrispat commented 1 year ago

The goal of this project is to provide a functional model for our customers to scale their own self-hosted infrastructure that fits their needs, not to provide a self-hosted model that is at parity with what is offered by the GitHub Cloud.

With that in mind we will only build and support a minimal runner image that customers can use as a base for their own needs.

vyrwu commented 2 months ago

@chrispat That's fair, but should be much more clearly communicated to the users. It is not directly obvious that the ubuntu-22.04 self-hosted runner images are not at parity with GitHub's ubuntu-22:04 runners. Our developers expect a smooth transition from managed to self-hosted runners. The current state of things does not highlight that there is an implicit expectation on the self-hosted runner providers to ensure tooling parity with managed runners.