Understand the feasibility and complexity of leveraging the self-hosted runners launch template that VSP spent a couple months creating. Currently a couple VA teams are now 100% running on - the content-build and vets-website teams. It takes minutes to spin up ECS containers in our pipelines, which creates a painful feedback cycle. We are interested in considering if the amount of time put into and using self-hosted runners will reap more benefits down the road, including bringing down this feedback cycle time. (goal and assumption)
Checklist
[ ] Note assumptions going into spike and throughout the spike - this is a potential gauge for complexity
[ ] Discuss and document with team
[ ] Create recommendations
Timebox
Estimate how long this should take, in days. Typically 3.
Assumptions
Additional Info/Resources
This spike was born out of some prelim research done for UI card #14
Links to self-hosted runners stood up by VSP/Demian Ginther
Notes from Ling's chat with Demian Ginther on July 1, 2021:
Runners are designated per repo
There is a Github Action that builds the runners
It is a separate process for auto-scaling group to launch them from launch template
Runner was built using terraform
Demian shared Terraform and use Packer to build the AMI OR
Learnings / gotchas that came up in the development process:
ruling multiple runners in the same instance
each job in the workflow can use a separate runner process that are separate from each other, by default. But in this instance, each job takes an entire ec2 instance….can get around this by spinning up different size runners (this is a bit of an overhead), or run as many runners on one instance as you want (but can lead to disk space issues due to caching), and need to 1docker prune volumes` once in a while. If using service containers like Postgres and want to bind to a port in the host, if another job tries to run on that same host, can’t use it because port is already bound due to no compartmentalization in a runner instance
self-hosted runners can be run in a Docker container, but GitHub does not support running runners inside Docker officially
there is a kubernetes operator that runs an elks cluster that can spin up ephemeral runners
the one gotcha is that a lot of users having to run on separate runners and then upload the data to the archive, and then re-download them to synthesize the results (content-build and vets-website)
Why not lambdas?
lambdas are hard to troubleshoot
General philosophy is the more you can use GitHub actions the better
Self-hosted runners was a way to get around the vaec network constraint
allows us to run larger runners than default GitHub runners
running 4 runner processes on each instance
some spin-up time for the instances, but it can be wasteful at night cause it rus(there is work to address this issue)
Demian adapted Packer build from Github…rewrote from Azure to AWS, with some additions
Runners as is should work for us (VANotify) - so can use the same packer script. The AMI comes up and self-registers with Github as repo, so not a lot of tweaking
Goal
Understand the feasibility and complexity of leveraging the self-hosted runners launch template that VSP spent a couple months creating. Currently a couple VA teams are now 100% running on - the content-build and vets-website teams. It takes minutes to spin up ECS containers in our pipelines, which creates a painful feedback cycle. We are interested in considering if the amount of time put into and using self-hosted runners will reap more benefits down the road, including bringing down this feedback cycle time. (goal and assumption)
Checklist
Timebox
Estimate how long this should take, in days. Typically 3.
Assumptions
Additional Info/Resources
Notes from Ling's chat with Demian Ginther on July 1, 2021:
Why not lambdas?
General philosophy is the more you can use GitHub actions the better
Demian adapted Packer build from Github…rewrote from Azure to AWS, with some additions
Runners as is should work for us (VANotify) - so can use the same packer script. The AMI comes up and self-registers with Github as repo, so not a lot of tweaking
Out of Scope
-
Open Questions