actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.72k stars 1.12k forks source link

useRunnerGroupsVisibility = true - ARC scaling wrong runners #1872

Open extravio opened 2 years ago

extravio commented 2 years ago

Checks

Controller Version

v0.25.2

Helm Chart Version

0.20.2

CertManager Version

1.8

Deployment Method

Helm

cert-manager installation

correct

Checks

Resource Definitions

-

To Reproduce

1. Create "general-dev" + "test-dev" Runner Groups in Github
2. Deploy ARC with useRunnerGroupsVisibility = true, set up the webhook, etc.
3. Create 1 runner deployment in the "general-dev" runner group with hra (1 to 5). Use "general-dev" label
4. Create 1 runner deployment in the "test-dev" runner group with hra (1 to 5). Also use "general-dev" label
5. Give your test Organization access to only the "test-dev" runner group
6. Create a workflow and specify run-on: "general-dev"
7. Execute the workflow n times

Describe the bug

As the test organization only has access to the "test-dev" runner group, the test dev runners shoud be scaled up. Instead, the general-dev runners are scaled up: image (ignore the general-win-dev runners)

Workflows are processed 1 by 1 by the single test-dev pod.

The payload of the webhook show the correct runner group image

Describe the expected behavior

"test-dev" pods would scale up and workflows would be processed in parallel

Controller Logs

-

Runner Pod Logs

-

Additional Context

No response

mumoshu commented 2 years ago

@extravio Hey! Thanks for reporting. AFAIK workflow job events had no "runner_group" info in the payload when we implemented this feature, so that we can make the autoscaling decision more correct? Just to be extra sure, what's the status of the workflow job event you're looking into? Is it queued, or is it in_progress?

extravio commented 2 years ago

Hi @mumoshu , Thanks for the super quick reply.

I guess the runner_group_name would have to be in the "workflow_job.queued" for ARC to process it correctly.

However, I thought that by setting "useRunnerGroupsVisibility = true", ARC would make the necessary API calls to check what runner groups are available to the repo and scale the appropriate runners.

mumoshu commented 2 years ago

@extravio Thanks! Makes sense. BTW, have you already configured ARC's github webhook server to use the secret containing the GitHub API credentials to call the necessary API for taking the runner group visibility into account? I thought it was a helm value githubWebhookServer.authSecret.name or something like that. If you didn't give it a secret, it would never try to call the API to determine the runner group visibility.

extravio commented 2 years ago

Hi @mumoshu

That might be the problem. The "githubWebhookServer.authSecret.name" is set to a secret defined in the webhook configuration in GitHub. I've set the "githubWebhookServer.authSecret.github_token" to the same secret as the controller manager. This PAT has admin:enterprise - manage_runners:enterprise permissions. Does "github_token" expects the actual PAT instead of a secret reference?

authSecret:
  enabled: true
  create: false
  name: "controller-manager"
  annotations: {}
[...]
githubWebhookServer:
  enabled: ${githubWebhookServer_enabled}
  replicaCount: 1
  useRunnerGroupsVisibility: true
  secret:
    enabled: true
    create: false
    name: "github-webhook-server"
    ### GitHub Webhook Configuration
    # github_webhook_secret_token: ""
    ### GitHub Apps Configuration
    ## NOTE: IDs MUST be strings, use quotes
    #github_app_id: ""
    #github_app_installation_id: ""
    #github_app_private_key: |
    ### GitHub PAT Configuration
    github_token: "controller-manager"

I can see in the webhook pod logs that the groups are identified correctly:

 Found some runner groups are managed by ARC     {"event": "workflow_job", [...], "groups": "RunnerGroup{Scope:Enterprise, Kind:Custom, Name:general-dev}, RunnerGroup{Scope:Enterprise, Kind:Custom, Name:test-dev}"}
[...]
 Found 1 HRAs by key     {"key": "enterprises/<myenterprise>/group/general-dev"}
 Found 1 HRAs by key     {"key": "enterprises/<myenterprise>/group/test-dev"}
mumoshu commented 2 years ago

Does "github_token" expects the actual PAT instead of a secret reference?

@extravio Yes!

extravio commented 2 years ago

Hi @mumoshu , I've created a new token image and hard-coded the value in the Helm chart:

githubWebhookServer:
  enabled: ${githubWebhookServer_enabled}
  replicaCount: 1
  useRunnerGroupsVisibility: true
  secret:
    enabled: true
    create: false
    name: "github-webhook-server"
    github_token: "ghp_[...]"

The token does not seem to be used and pods in both runner groups are scaled randomly: image

mumoshu commented 2 years ago

@extravio I think you need to set githubWebhookServer.secret.create to true to let it the chart create and update the secret for you. Check the existing github-webhook-server secret- you'll see an incorrect github_token value in it.

extravio commented 2 years ago

Sorry @mumoshu , I'm a bit confused here. In the secret github-webhook-server, there's a copy of the secret defined in the webhook configuration in Github (to let github authenticate with the webhook server) image I think it's correct as if I change this token, the webhook fails.

I thought the githubWebhookServer.secret.github_* properties were for GitHub authentication (so that the webhook server could call the GitHub API and get info about the Runner Groups). That's what I've understood from the Runner Groups section of the documentation.

mumoshu commented 2 years ago

@extravio I may be still missing something, that I think we're both correct.

You do need to give github_webhook_secret_token via https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/values.yaml#L200 if you want to verity the webhook sender with the token generated by GitHub and visible in the Manage-webhook page.

However, you also need to provide a valid GitHub API authentication info, either PAT or GitHub App credentials, via https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/values.yaml#L201-L207, to enable the runner group visibility feature.

extravio commented 2 years ago

Hi @mumoshu , thanks for your help. I was not able to make it work though. I've got to move on so I've decided to use labels as a work-around for now. I'll revisit it when I have a bit more time.

Sam-Lin-MillersLab commented 1 year ago

@extravio I may be still missing something, that I think we're both correct.

You do need to give github_webhook_secret_token via https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/values.yaml#L200 if you want to verity the webhook sender with the token generated by GitHub and visible in the Manage-webhook page.

However, you also need to provide a valid GitHub API authentication info, either PAT or GitHub App credentials, via https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/values.yaml#L201-L207, to enable the runner group visibility feature.

what permission should I set here for this PAT?