actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.6k stars 1.09k forks source link

Add workflow run times to ARC metrics #2359

Open kkaresz-tw opened 1 year ago

kkaresz-tw commented 1 year ago

What would you like added?

In addition to the labels discussed under https://github.com/actions/actions-runner-controller/issues/2176 and implemented in https://github.com/actions/actions-runner-controller/pull/2218 and https://github.com/actions/actions-runner-controller/pull/2225 it would also be good to see workflow related metrics reported by the metrics server. These could be collected from the workflow_run events.

In addition to:

github_workflow_job_run_duration_seconds_bucket
github_workflow_job_run_duration_seconds_count
github_workflow_job_run_duration_seconds_sum

github_workflow_jobs_started_total
github_workflow_jobs_completed_total

the following would also be useful to see:

github_workflow_run_duration_seconds_bucket
github_workflow_run_duration_seconds_count
github_workflow_run_duration_seconds_sum

github_workflows_started_total
github_workflows_completed_total

I might try to open a PR for this unless someone quicker beats me to it.

Why is this needed?

The sum of each job run times of a given workflow isn't equal to the actual time the workflow took to finish because of parallel jobs, queuing of the jobs, etc.

If I wanted to measure how long engineers in our organisation need to wait for CI, workflow run times make more sense for us. If our CI team improved a shared or required workflow by replacing/rewriting a job or an action, or changed anything around the (self-hosted) infrastructure e.g. using larger nodes, changing the autoscaling, etc. we would like to measure what the impact of those changes were, if any.

Also, if we wanted to feed into our business metrics by measuring the time a PR took from start to finish including how much time CI took in the process, workflow run times would be better than individual job run times.

Additional context

In my organisation we're working on our own version of workflow and job metrics, but it has its own issues and adds TOIL to the team which could be discarded if the ARC provided these numbers out of the box.

github-actions[bot] commented 1 year ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

mumoshu commented 1 year ago

Hey @kkaresz-tw!

If I wanted to measure how long engineers in our organization need to wait for CI, workflow run times make more sense for us

Great. That does make sense to me!