actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.71k stars 1.11k forks source link

`ServiceMonitor` label selector logic for github-webhook-server component is broken #2106

Open sdickhoven opened 1 year ago

sdickhoven commented 1 year ago

Checks

Controller Version

0.26.0

Helm Chart Version

0.21.1

CertManager Version

1.8.0

Deployment Method

Helm

cert-manager installation

n/a. this has nothing to do with the controller. this is a helm chart bug.

Checks

Resource Definitions

n/a. this has nothing to do with the controller. this is a helm chart bug.

To Reproduce

1. run

$ helm template test actions-runner-controller/actions-runner-controller \
    --version 0.21.1 \
    --set "githubWebhookServer.enabled=true,metrics.serviceMonitor=true"

Describe the bug

the Service by the name of test-actions-runner-controller-github-webhook-server has the following labels:

app.kubernetes.io/name: actions-runner-controller
app.kubernetes.io/instance: test

the ServiceMonitor by the name of test-actions-runner-controller-github-webhook-s-service-monitor (which should be selecting the above Service) has the label selectors:

app.kubernetes.io/name: actions-runner-controller
app.kubernetes.io/instance: test-github-webhook-server

i.e. this ServiceMonitor does not select any Services (and certainly not the Service of the github-webhook-server).

furthermore, the ServiceMonitor for the main controller component (test-actions-runner-controller-service-monitor) now selects a mixed set of controller- and github-webhook-server pods (because both Services have the same name & instance labels)... which means that the metrics for the main controller component will be somewhat "confused" when the github-webhook-server component is enabled.

Describe the expected behavior

there's an error in the labeling logic for the github-webhook-server Service.

this:

https://github.com/actions/actions-runner-controller/blob/actions-runner-controller-0.21.1/charts/actions-runner-controller/templates/githubwebhook.service.yaml#L7-L8

should be

  labels:
    {{- include "actions-runner-controller-github-webhook-server.selectorLabels" . | nindent 4 }}

by the same token, it would probably be more correct to add the same labels to the ServiceMonitor itself as well... though that would be a cosmetic fix only:

https://github.com/actions/actions-runner-controller/blob/actions-runner-controller-0.21.1/charts/actions-runner-controller/templates/githubwebhook.serviceMonitor.yaml#L5-L6

alternatively, you could turn the ServiceMonitor for the github-webhook-server component into a PodMonitor since the pod labels are already correct.

if you did that, you would no longer need:

https://github.com/actions/actions-runner-controller/blob/actions-runner-controller-0.21.1/charts/actions-runner-controller/templates/githubwebhook.service.yaml#L19-L23

however, that would make the helm config metrics.serviceMonitor somewhat misleading.

Whole Controller Logs

n/a. this has nothing to do with the controller. this is a helm chart bug.

Whole Runner Pod Logs

n/a. this has nothing to do with the controller. this is a helm chart bug.

Additional Context

on a related note...

the label app.kubernetes.io/instance is not used correctly in this helm chart.

i realize that this was most likely a concession to backward compatibility. but if you ever have to release a breaking change for the helm chart, you might as well fix the labeling to be e.g.:

app.kubernetes.io/name: actions-runner-controller
app.kubernetes.io/instance: test
app.kubernetes.io/component: controller

and

app.kubernetes.io/name: actions-runner-controller
app.kubernetes.io/instance: test
app.kubernetes.io/component: github-webhook-server

instead of

app.kubernetes.io/name: actions-runner-controller
app.kubernetes.io/instance: test

and

app.kubernetes.io/name: actions-runner-controller
app.kubernetes.io/instance: test-github-webhook-server
github-actions[bot] commented 1 year ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.