go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.42k stars 5.52k forks source link

Cannot (sometimes) find runner by label when multiple self-hosted runners are available #32348

Open tomasmusil opened 1 month ago

tomasmusil commented 1 month ago

Description

I am running a self-hosted instance of Gitea with two runners both setup as host (no Docker), one is on the same server as my Gitea instance, one is on another server. Both have ":host" at the end of their labels in .runner file. image

I have a workflow that deploys to the local machine.

-- ci.yml --

...
jobs:
  build:
    uses: ./.gitea/workflows/reusable/build.yml
    secrets: inherit
...

-- reusable/build.yml --

...
jobs:
  build:
    runs-on: noxlabs_runner
    steps:
      - name: Git clone to work directory
        uses: actions/checkout@v4
...

With this setup, most of the time, not always the job fails with these logs: image On line 1 it seems to incorrectly find the runner by label (action above should have been executed on noxlabs_runner) and then proceeds to run docker images even though it should be run on the host. If I try to manually rerun the job, after 10-15 attempts it finds the correct runner and completes the workflow. It persists whether there is only one running job or multiple concurrent ones. This issue was not happening when I ran multiple concurrent jobs with only one runner configured. It seems to be connected simply to the fact that I have two active runners, not to the fact that I am running concurrent jobs.

Gitea Version

1.22.3

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.34.1

Operating System

Both servers are Ubuntu 22.04 x86_64

How are you running Gitea?

Gitea is deployed from oficial release binary, same as both runners. Gitea and runners are being run with systemd.

Database

PostgreSQL

marza-sergey commented 5 days ago

I'm having the same issue with host runner running on windows. Seems like gitea ignores runs-on: statement in reusable workflow. If I add runs-on: setting in the original workflow that calls reusable workflow, gitea has no problem finding the right runner.

It can be easily replicated by disabling all the runners except the host one. This is the workflow I that's not working: pull-checks.yaml

name: Check PR
run-name: ${{ gitea.actor }} checking PR
on: [pull_request]
jobs:
  test-version:
    uses: http://path/to/my/reusable/workflow/test_version.yaml@main
    with:
      package-name: pyf
      ssh-key: ${{ secrets.DEPLOY_SSH_KEY }}
      my-pypi-token: ${{ secrets.PYPI_TOKEN }}

And corrected:

name: Check PR
run-name: ${{ gitea.actor }} checking PR
on: [pull_request]
jobs:
  test-version:
    uses: http://path/to/my/reusable/workflow/test_version.yaml@main
    runs-on: rocky-8
    with:
      package-name: pyf
      ssh-key: ${{ secrets.DEPLOY_SSH_KEY }}
      marza-pypi-token: ${{ secrets.PYPI_TOKEN }}

test_version.yaml

name: Test version
run-name: ${{ gitea.actor }} testing if version is correctly specified
on:
  workflow_call:
    inputs:
      package-name:
        required: true
        type: string
      package-directory:
        required: true
        type: string
        default: .
      my-repository:
        required: true
        type: string
        default: http://path/to/repo
     my-pypi-token:
        required: true
        type: string
      ssh-key:
        type: string
        required: false
        description: Required if has github submodule

jobs:
  check-version:
    runs-on: rocky-8
    steps:
    ...

And configurations for act runner: config.yaml

log:
  level: debug

runner:
  file: .runner
  capacity: 1
  env_file: .env
  timeout: 3h
  shutdown_timeout: 0s
  insecure: false
  fetch_timeout: 5s
  fetch_interval: 2s
  labels:
    - "windows:host"

cache:
  enabled: true
  dir: ""
  host: ""
  port: 0
  external_server: ""

host:
  workdir_parent:

.runner

{
  "WARNING": "This file is automatically generated by act-runner. Do not edit it manually unless you know what you are doing. Removing this file will cause act runner to re-register as a new runner.",
  "id": 15,
  "uuid": "<redacted>",
  "name": "<redacted>",
  "token": "<redacted>",
  "address": "<redacted>",
  "labels": [
    "windows:host"
  ]
}
tomasmusil commented 3 days ago

If I add runs-on: setting in the original workflow that calls reusable workflow, gitea has no problem finding the right runner.

Thank you for the tip, in the mean time I just moved all code from reusable workflows to the main workflows as it is not that big of a hassle and I get the added benefit of Gitea actually showing the individual steps in UI and not just combining them into "Set up job". With that I verified that it is an issue with reusable workflows specifically, as without them there seems to be no problem selecting the correct runner.