Kong / charts

Helm chart for Kong
Apache License 2.0
250 stars 480 forks source link

No metrics from kong-controller #1053

Open NissesSenap opened 7 months ago

NissesSenap commented 7 months ago

Trying to get controller metrics in to prometheus using the ingress chart. My values file looks like this and I'm using helm chart ingress-0.12.0

controller:
  serviceMonitor:
    enabled: true
  podAnnotations: {} # disable kuma and other sidecar injection

gateway:
  serviceMonitor:
    enabled: true
  replicaCount: 3

Reading https://github.com/Kong/charts/tree/main/charts/kong#prometheus-operator-integration it says that I should

ingressController:
  labels:
    enable-metrics: "true"

The problem is that there is no ingressController.labels https://github.com/Kong/charts/blob/6906fa6b8a538d4b99f51167e573d1d4f9871f28/charts/kong/values.yaml#L531

But lets look at the generated servicemonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kong-controller
  namespace: kong
spec:
  endpoints:
  - scheme: http
    targetPort: status
  - scheme: http
    targetPort: cmetrics
  jobLabel: kong
  namespaceSelector:
    matchNames:
    - kong
  selector:
    matchLabels:
      app.kubernetes.io/instance: kong
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: controller
      app.kubernetes.io/version: "3.6"
      enable-metrics: "true"
      helm.sh/chart: controller-2.38.0

It also seems like there is no service that makes this endpoint public, at least not that works with the label selectors.

➜ k get svc -l app.kubernetes.io/instance=kong
NAME                                 TYPE           CLUSTER-IP        EXTERNAL-IP     PORT(S)                         AGE
kong-controller-validation-webhook   ClusterIP      192.168.201.162   <none>          443/TCP                         19h
kong-gateway-admin                   ClusterIP      None              <none>          8444/TCP                        19h
kong-gateway-manager                 NodePort       192.168.198.207   <none>          8002:31158/TCP,8445:30529/TCP   19h
kong-gateway-proxy                   LoadBalancer   192.168.207.157   104.199.43.98   80:32404/TCP,443:31598/TCP      19h

➜ k get svc -l app.kubernetes.io/managed-by=Helm
NAME                                 TYPE           CLUSTER-IP        EXTERNAL-IP     PORT(S)                         AGE
kong-controller-validation-webhook   ClusterIP      192.168.201.162   <none>          443/TCP                         19h
kong-gateway-admin                   ClusterIP      None              <none>          8444/TCP                        19h
kong-gateway-manager                 NodePort       192.168.198.207   <none>          8002:31158/TCP,8445:30529/TCP   19h
kong-gateway-proxy                   LoadBalancer   192.168.207.157   104.199.43.98   80:32404/TCP,443:31598/TCP      19h

➜ k get svc -l app.kubernetes.io/name=controller
NAME                                 TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
kong-controller-validation-webhook   ClusterIP   192.168.201.162   <none>        443/TCP   19h

➜ k get svc -l app.kubernetes.io/version=3.6
NAME                                 TYPE           CLUSTER-IP        EXTERNAL-IP     PORT(S)                         AGE
kong-controller-validation-webhook   ClusterIP      192.168.201.162   <none>          443/TCP                         19h
kong-gateway-admin                   ClusterIP      None              <none>          8444/TCP                        19h
kong-gateway-manager                 NodePort       192.168.198.207   <none>          8002:31158/TCP,8445:30529/TCP   19h
kong-gateway-proxy                   LoadBalancer   192.168.207.157   104.199.43.98   80:32404/TCP,443:31598/TCP      19h

➜ k get svc -l enable-metrics=true    
NAME                 TYPE           CLUSTER-IP        EXTERNAL-IP     PORT(S)                      AGE
kong-gateway-proxy   LoadBalancer   192.168.207.157   104.199.43.98   80:32404/TCP,443:31598/TCP   19h

➜ k get svc -l helm.sh/chart=controller-2.38.0
NAME                                 TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
kong-controller-validation-webhook   ClusterIP   192.168.201.162   <none>        443/TCP   19h

So there is no selector that matches all of them even though kong-controller-validation-webhook is the closest, but it don't make the cmetrics port available.

To me, it seems like we need to add a new service to the controller that is created when setting serviceMonitor.enabled: true and we need to add ingressController.labels.

Unless I have missed something obvious. Love to get some feedback on this.

NissesSenap commented 7 months ago

As a workaround, I did a simple podMonitor in my values file.

  extraObjects:
    - apiVersion: monitoring.coreos.com/v1
      kind: PodMonitor
      metadata:
        labels:
          app.kubernetes.io/component: app
          app.kubernetes.io/instance: kong
          app.kubernetes.io/name: controller
        name: kong-controller
        namespace: kong
      spec:
        podMetricsEndpoints:
          - path: /metrics
            targetPort: cmetrics
        selector:
          matchLabels:
            app.kubernetes.io/component: app
            app.kubernetes.io/instance: kong
            app.kubernetes.io/name: controller
rainest commented 7 months ago

Yeah, this looks off. I'm not clear why we actually have a ServiceMonitor here, as both the gateway status and controller metrics ports are designed to not have a Service, as they're exposing per-Pod information. The docs for this look quite out of date and my best guess is that we had a ServiceMonitor originally for the admin API, as earlier versions did not have the dedicated status listen or controller metrics.

PodMonitor is probably the correct choice here, and I'm not entirely sure why we didn't switch to it. https://github.com/Kong/kubernetes-ingress-controller/issues/1770#issuecomment-931737778 suggests that we did so to avoid a breaking change, but we can probably handle this with a legacy behavior check that still honors serviceMonitor.enabled as equivalent to podMonitor.enabled. Dunno if there are other different settings between the two that would require further breaking changes.

Per that other comment, ServiceMonitor actually should (unintuitively) be fine for ports that lack Services, but I don't recall the specifics of how that works in the upstream operator. My best guess there is that this worked with kong/kong, but the split Deployments of kong/ingress break it. In any case it's kinda wonky to use it that way; we should really consider switching to PodMonitor.

Tentative AC: