actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.76k stars 1.12k forks source link

Failing check runs causes erroneous scale down #3584

Open rickardgranberg opened 5 months ago

rickardgranberg commented 5 months ago

Checks

Controller Version

0.27.6

Helm Chart Version

0.23.7

CertManager Version

1,12,2

Deployment Method

Helm

cert-manager installation

Not a problem with cert-manager.

Checks

Resource Definitions

Not applicable, see #2118

To Reproduce

Not applicable, see #2118

Describe the bug

This bug is a continuation of issue #2118 more specifically the fixes implemented in PR #2520. The fix tries to exclude check runs from causing scale down, but as can be seen on line 220 of the fix, it only does so if the check is successful: https://github.com/actions/actions-runner-controller/blob/894732732a5a414deea5e47c9793caaa05d641b4/controllers/actions.summerwind.net/horizontal_runner_autoscaler_webhook.go#L220 By the nature of check runs, they are intended to fail if there's a problem. So I question why this filter only applies for the "success" conclusion. As it is now, it causes erroneous scale down whenever a check fails (yes. like in the original issue, we're using EnricoMi/publish-unit-test-result-action)

One observation I have is that these events have the labels set to [] so perhaps that would be a better way to detect them?

{
  "action": "completed",
  "workflow_job": {
    <redacted>
    "status": "completed",
    "conclusion": "failure",
    "created_at": "2024-06-07T12:13:20Z",
    "started_at": "2024-06-07T12:13:20Z",
    "completed_at": "2024-06-07T12:13:20Z",
    "name": "Test Results",
    "steps": [

    ],
    "check_run_url": "https://api.github.com/repos/<redacted>",
    "labels": [

    ],
    "runner_id": null,
    "runner_name": null,
    "runner_group_id": null,
    "runner_group_name": null
  },

Describe the expected behavior

I expect check run events to not cause scale down regardless of the conclusion.

Whole Controller Logs

Not applicable, see #2118

Whole Runner Pod Logs

Not applicable, see #2118

Additional Context

No response

github-actions[bot] commented 5 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

rickardgranberg commented 5 months ago

I will provide a PR with a fix if you accept one? #3594