Kubernetes controller for GitHub Actions self-hosted runners
[gha-runner-scale-set-controller] metrics not exposed for the listener #3510

isatfg commented 1 month ago


Controller Version


Deployment Method



To Reproduce

1. In the values.yaml enable the metrics
  controllerManagerAddr: ":8080"
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

2. Now port-forward to the listener pod on the port configured (8080)

3. In a browser got to localhost:8080/metrics

You will get an EOF Error.

Describe the bug

I have enabled metrics in the gha-runner-scale-set-controller metrics: controllerManagerAddr: ":8080" listenerAddr: ":8080" listenerEndpoint: "/metrics"

I can see that the controller pod is exposing metrics on port 8080/metrics

` gha_controller_failed_ephemeral_runners gha_controller_pending_ephemeral_runners gha_controller_running_ephemeral_runners gha_controller_running_listeners

According to the documentation the listner is the owner of some metrics E.g.

gha_assigned_jobs gha_running_jobs However these metrics are not exposed on the controller or the listner. When I port-forward to the listner and go to the metrics endpoint e.g. localhost:8080/metrics I get an error

an error occurred forwarding 8080

Describe the expected behavior

When I port-forward to the listener I should get metrics in the same way I get metrics from the controller.

Additional Context

# Default values for gha-runner-scale-set-controller.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
labels: {}

# leaderElection will be enabled when replicaCount>1,
# So, only one replica will in charge of reconciliation at a given time
# leaderElectionId will be set to {{ define gha-runner-scale-set-controller.fullname }}.
replicaCount: 1

  repository: ""
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

## Define environment variables for the controller pod
#  - name: "ENV_VAR_NAME_1"
#    value: "ENV_VAR_VALUE_1"
#  - name: "ENV_VAR_NAME_2"
#    valueFrom:
#      secretKeyRef:
#        key: ENV_VAR_NAME_2
#        name: secret-name
#        optional: true

  # Specifies whether a service account should be created for running the controller pod
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  # You can not use the default service account for this.
  name: ""

podAnnotations: "true" "8080"

podLabels: {}

podSecurityContext: {}
# fsGroup: 2000

securityContext: {}
# capabilities:
#   drop:
#   - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000

resources: {}
## We usually recommend not to specify default resources and to leave this as a conscious
## choice for the user. This also increases chances charts run on environments with little
## resources, such as Minikube. If you do want to specify resources, uncomment the following
## lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
#   cpu: 100m
#   memory: 128Mi
# requests:
#   cpu: 100m
#   memory: 128Mi

nodeSelector: {}

tolerations: []

affinity: {}

# Mount volumes in the container.
volumes: []
volumeMounts: []

# Leverage a PriorityClass to ensure your pods survive resource shortages
# ref:
# PriorityClass: system-cluster-critical
priorityClassName: ""

## If `metrics:` object is not provided, or commented out, the following flags 
## will be applied the controller-manager and listener pods with empty values: 
## `--metrics-addr`, `--listener-metrics-addr`, `--listener-metrics-endpoint`. 
## This will disable metrics.
## To enable metrics, uncomment the following lines.
  controllerManagerAddr: ":8080"
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

  ## Log level can be set here with one of the following values: "debug", "info", "warn", "error".
  ## Defaults to "debug".
  logLevel: "debug"
  ## Log format can be set with one of the following values: "text", "json"
  ## Defaults to "text"
  logFormat: "text"

  ## Restricts the controller to only watch resources in the desired namespace.
  ## Defaults to watch all namespaces when unset.
  # watchSingleNamespace: ""

  ## Defines how the controller should handle upgrades while having running jobs.
  ## The strategies available are:
  ## - "immediate": (default) The controller will immediately apply the change causing the
  ##   recreation of the listener and ephemeral runner set. This can lead to an
  ##   overprovisioning of runners, if there are pending / running jobs. This should not
  ##   be a problem at a small scale, but it could lead to a significant increase of
  ##   resources if you have a lot of jobs running concurrently.
  ## - "eventual": The controller will remove the listener and ephemeral runner set
  ##   immediately, but will not recreate them (to apply changes) until all
  ##   pending / running jobs have completed.
  ##   This can lead to a longer time to apply the change but it will ensure
  ##   that you don't have any overprovisioning of runners.
  updateStrategy: "immediate"

Controller Logs

Runner Pod Logs
nikola-jokic commented 1 month ago

Hey @isatfg,

Please correct me if I'm wrong, but the error says that port forwarding is the problem. Is it possible that you tried to forward both the controller and the listener on the same port? I successfully forwarded both the controller and the listener metrics.

isatfg commented 1 month ago

Hey @nikola-jokic

hmm, so was trying to reproduce the issue again to explain the steps and now I see metrics on the listener as expected. I honestly have no idea what happened.

So now I have the metrics enabled and I can port-forward to the controller and get controller metrics and port-forward to the listener and get listener metrics. Apologies for wasting you time

Thank you