actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.73k stars 1.12k forks source link

horizontalrunnerautoscaler Detected job with no labels, which is not supported by ARC. Skipping anyway #2613

Open mattpopa opened 1 year ago

mattpopa commented 1 year ago

Checks

Controller Version

v0.27.4

Helm Chart Version

0.23.3

CertManager Version

No response

Deployment Method

Helm

cert-manager installation

yes, this is the cert manager has been installed using

helm upgrade --install cert-manager jetstack/cert-manager \                                                                                                         
--namespace cert-manager \
--create-namespace \
--version v1.11.0 \
--set installCRDs=true --wait

Checks

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: self-hosted-large
  namespace: actions-runner-system
spec:
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    spec:
      serviceAccountName: github-actions-sa
      securityContext:
        # For Ubuntu 20.04 runner
        fsGroup: 1000
      organization: my-org
      image: summerwind/actions-runner-dind:latest
      imagePullPolicy: IfNotPresent
      ephemeral: true
      dockerEnabled: false
      dockerdWithinRunnerContainer: true
      containers:
      - name: runner
        resources:
          requests:
            memory: "10Gi"
            cpu: "3000m"
          limits:
            memory: "10Gi"
            cpu: "3000m"
      labels:
        - large
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  namespace: actions-runner-system
  name: self-hosted-large
spec:
  scaleDownDelaySecondsAfterScaleOut: 10
  scaleTargetRef:
    kind: RunnerDeployment
    name: self-hosted-large
  minReplicas: 0
  maxReplicas: 6
  metrics:
    - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
      repositoryNames:
        - frontend

To Reproduce

this happens randomly, and the jobs have labels using this format:

runs-on: [self-hosted, large]

https://github.com/actions/actions-runner-controller/blob/032443fcfd4cf7b6e8bb09ed9dca639bcba9f8a4/controllers/actions.summerwind.net/autoscaling.go#L153


### Describe the bug

Randomly, the `horizontalrunnerautoscaler` doesn't update the desired replicas and the job waits indefinitely in github:

Requested labels: self-hosted, large Job defined at: my-org/frontend/.github/workflows/zcommon_web_e2e_tests.yml@refs/heads/master Reusable workflow chain: my-org/frontend/.github/workflows/web_scheduled_e2e.yml@refs/heads/master (a9790cfa59ca77ead2f8ec4987a9cac8e98cfcce) -> my-org/frontend/.github/workflows/zcommon_web_e2e_tests.yml@refs/heads/master (a9790cfa59ca77ead2f8ec4987a9cac8e98cfcce) Waiting for a runner to pick up this job...

and the job uses the following label format

runs-on: [self-hosted, large]


should there be any dif between setting labels within quotes for the `horizontalrunnerautoscaler`?

runs-on: [self-hosted, large]


vs

runs-on: ["self-hosted", "large"]

?

any suggestion on how to further debug this?

### Describe the expected behavior

we shouldn't see this error in the ARC logs

### Whole Controller Logs

```shell
2023-05-22T10:02:22Z    INFO    horizontalrunnerautoscaler  Detected job with no labels, which is not supported by ARC. Skipping anyway.    {"labels": [], "run_id": 5044287443, "job_id": 13654547143}

### Whole Runner Pod Logs

```shell
there are no runner logs available

Additional Context

there is no runner in pending state, there are avialble resources on the node(s).

rtsisyk commented 12 months ago

I have the same issue and I don't understand how to use TotalNumberOfQueuedAndInProgressWorkflowRuns

ktomaszx commented 1 month ago

Same issue, does anyone know what is the root cause?

VladKhachikyan commented 1 month ago

Same issue