ci: Change testing farm runs to not occupy a runner

cgwalters commented 5 months ago

The github.com/containers organization is pretty large and active, but only has the default 20 Github-hosted runners available right now.

We have this repo hooked up to Testing Farm. The problem here is that the 4 distinct TF runs we do each occupy a whole Github-hosted action runner virtual machine to basically poll a remote HTTP server, which is quite wasteful.

It looks to me like the TF action itself supports being configured to report status back to the PR, without holding a runner? There's an update_pull_request_status flag...

Hmm, are we doing things this way because we're cloning the git repository here because we have the tests?

cc @henrywang

henrywang commented 5 months ago

Yes, https://github.com/virt-s1/bootc-workflow-test is using update_pull_request_status. But that still needs a github action runner. For example, https://github.com/virt-s1/bootc-workflow-test/actions/runs/8845171999/job/24288556567.

Or we can use self hosted runner (container). The https://github.com/virt-s1/bootc-workflow-report repo already uses self hosted github action runner (container). For example, https://github.com/virt-s1/bootc-workflow-report/blob/c723ccca5482495d0860b6f187d971d0058d7560/.github/workflows/trigger-rhel-9-4.yml#L13. This is the self hosted runner (container) deployment script: https://github.com/virt-s1/kite-action/blob/main/tools/deploy_container.yaml

The self hosted github action runner does not support auto scale. That means the runner has to be there before use it. But I have a solution to support auto scale github action runner. https://github.com/virt-s1/kite-action/tree/main. RHEL for Edge QE CI and osbuild-composer repo CI (RHEL for Edge part) have been using it for 2 years.

cgwalters commented 5 months ago

Or we can use self hosted runner (container).

Oh yes that makes lots of sense. Hmm. Actually...yeah, we should wire up something like this officially to the whole github.com/containers organization. Just thinking through how it works, there's also https://github.com/redhat-actions/openshift-actions-runners and https://github.com/actions/actions-runner-controller which looks even better (I think, although it's not clear offhand if it supports using e.g. container: to configure the pod image as distinct from the runner hoster, if it doesn't that weakens things a lot).

henrywang commented 5 months ago

Yeah, this's a really good solution for self hosted runner. Three reasons I didn't use this solution:

This solution needs a public k8s, I can't find a free one.
RHEL for Edge needs VM or bare metal server, not container. (RHEL for Edge does not use testing farm yet)
openshift is not supported

I can spend some time on this solution and try something. That should be interesting.

thrix commented 5 months ago

Circling back to using rather Packit + Testing Farm, afaik people use the Packit + Testing Farm combo to overcome the github runners limitations, as with that in place there are no github runners in place needed.

Just noting that Packit does not need to do any building, it can just trigger Testing Farm.

Also Packit and Testing Farm sync using "webhooks", so Packit does not need to actively wait on Testing Farm, it gets notified from Testing Farm when the state changes. So it scales a lot better.

I am not sure if webhooks could be used with GitHub runners, it would be great ... actually even Fedora CI was able to prevent active waiting using webhook step plugin: https://plugins.jenkins.io/webhook-step/

cgwalters commented 4 months ago

OK yep, just got bit by this again - all 20 runner slots for the containers/ org were taken up by testing farm polling tasks.

containers / bootc

ci: Change testing farm runs to not occupy a runner #496