Production-ready: Configure Observability

slopezz commented 1 year ago

In 3scale SaaS we have been using successfully limitador for a couple of years together with Redis, to protect all our public endpoints. However:

We are using an old image community image
Yamls are managed individually via ArgoCD

We would like to update how we manage limitador application, and use the most recommended limitador setup using limitador-operator, with a production-ready grade.

Current limitador-operator (at least the version `0.4.0` that we use):

Provides a few prometheus metrics in the HTTP port
Do not create a prometheus PodMonitor by default
Do not create a GrafanaDashboard by default
Do not permit to create a prometetheus PodMonitor via CR
Do not permit to create a GrafanaDashboard via CR

Desired features:

Permit to create a prometheus PodMonitor via CR
Permit to create a GrafanaDashboard via CR
Being observability something optional, might not be enabled by default

3scale SaaS specific example

Example of the PodMonitor used in 3scale SaaS production to manage between 3,500 and 5,500 requests/second with 3 limitador pods (selector labels need to coincide with the labels managed right now by limitador-operator):

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: limitador
spec:
  podMetricsEndpoints:
    - interval: 30s
      path: /metrics
      port: http
      scheme: http
  selector:
    matchLabels:
      app.kubernetes.io/name: limitador

Possible CR config

Both PodMonitor and GrafanaDashboard should be able to be customized via CR, but use default sane values if they are enabled, so you dont need to provide all the config if you dont want, and want to trust on defaults.

PodMonitor possible customization:
- enabled: true/false
- interval: how often prometheus-operator will scrape limitador pods (have an impact on prometheus memory/timeseries database sizes)
- labelSelector: sometimes prometheus-operator is configured to scrape PodMonitors/ServiceMonitors with specific label selectors

GrafanaOperator possible customization:

enabled: true/false

labelSelector: sometimes grafana-operator is configured to scrape GrafanaDashboards with specific label selectors

apiVersion: limitador.kuadrant.io/v1alpha1
kind: Limitador
metadata:
name: limitador-sample
spec:
podMonitor:
enabled: true  # by default it is false, so does not create a PodMonitor
interval: 30s # by default it is 30 if not defined
labelSelector: XX ## by default not define any label/selector
...  ## maybe in the future permit to override more PodMonitor fields if needed, dont think anymore is needed by now
grafanaDashboard:
enabled: true
labelSelector: XX ## by default not define any label/selector

The initial dashboard would be provided by us initially (3scale SRE), can be embedded into operator as an asset, like done with 3scale-operator.

Current Dashboard screenshots including limitador metrics by limitador_namespace (the app being limited), and also pods, resources cpu/mem/net metrics:

PrometheusRules (aka prometheus alerts)

Regarding PrometheusRules (prometheus alerts), my advise is to not embed them into the operator, but provide in the repo a yaml with an example of possible alerts that can be deployed, tuned... by the app administrator if needed.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: limitador
spec:
  groups:
    - name: limitador.rules
      rules:
        - alert: LimitadorJobDown
          annotations:
            message: Prometheus Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
          expr: up{job=~".*limitador.*"} == 0
          for: 5m
          labels:
            severity: critical

        - alert: LimitadorPodDown
          annotations:
            message: Limitador pod {{ $labels.pod }} on {{ $labels.namespace }} is DOWN
          expr: limitador_up == 0
          for: 5m
          labels:
            severity: critical

Boomatang commented 1 year ago

I have a few points for decision which will affect how theses changes are done. The limitador-operator allows multiply limitador CRs in the same namespace and/or in multiply namespaces.

Do we expect there to be a separate PodMonitor for every limitador CR? This seems wastefully as we could monitor many pods and namespaces with a single PodMonitor. But this brings its own issues.
If the user does not configure the podMonitor section of one limitador CR instance, but there are other instances that are configured for pod monitors, should the non-configured instances also have podMonitor attached? I believe the podMonitors should not be added to the non-configured instances.
I would expect if the podMonitor configuration in two different limitador CR states different label selectors that limitador-operator would create two different podMonitor configurations. This would mean that before creating any podMonitors the limitador-operator would first need to find any existing podMonitor CRs to update. Question is then who is responsible for removing the podMonitors during an uninstall? I would assume the last limitador CR to be removed. I am assuming we do not configure the podMonitors to check all namespaces but only the namespace we specify.
If there is one podMonitor for a number limitador CR instances it would be in reason to suspect that there should be one dashboard to cover that selector label. Can the current 3scale dashboard handle multiply namespaces and instances of limitador?
If a user is adding different label selectors for grafanaDashboards, it is possible there can be multiply grafana instances on the cluster? I am not sure how this would affect the deployments of the dashboards or pod monitors but something to look into.

slopezz commented 1 year ago

Hi @Boomatang ,

PodMonitor

For simplicity, I would treat the PodMonitor for a given limitador CR, as any other usual resource attached to the limitador CR, like the Service or the Deployment.

That means, each Limitador CR will have its own PodMonitor, with its own labelSelectors taken from the limitador CR, the same it has its own Deployment or Service, keeping things simple.

GrafanaDashboard

Regarding the dashboard, I didn't know limitador-operator could manage limitador CR instances on different namespaces or even multiple instances on the same namespace.

In our 3scale SaaS use case, we have a single limitador instance managing rate limits for any given namespace (since it is used the k8s Service name from envoy), and I guess that having a single instance would be the most usual case.

It is a bit tricky here, since with a single GrafanaDashboard you can view the metrics from any possible limitador instance (you would need to use the limitador instance name as the dashboard selector, aside from namespace maybe).

If you have multiple limitador instances on different namespaces, by default Grafana-operator will create a GrafanaDashboard on every Namespace, since the namespace name is used as a dashboard directory name.

However, what about multiple instances in the same namespace? TBH I don't know the best way to handle this situation, since any object created by the operator will add its own ownerReference annotation with the CR name...., and if you have multiple CRs creating the same dashboard name (so same resource) in the same namespace, it will only have the ownerReferences from one of them (which is not super bad, but maybe not ideal).

In our saas-operator case, we have multiple CRDs that can only be installed once in a namespace, but there is a single case where the CRD can have multiples instances on the same namespace, we finally ended up creating the same dashboard for every CR, using the CR name as a suffix for the dashboard name, so having multiple dashboards showing the same info, which it is not an ideal solution actually...

We have another scenario with prometheus-exporter-operator where we permit multiple CRs per namespace, but only a single dashboard per CR, so we end up with a single dashboard per namespace used to watch metrics from any CR, however the ownerReferences of the dashboard resource are from a single CR. If the CR associated to the dashboard is deleted, thanks to ownerReferences the dashboard resource will be deleted, but operator will detect there is a missing dashboard resource for the other possible instances and will create the dashboard again, but with a different dashboard ID (needed if you want the same dashboard URL).

So I don't see which could be the best solution here.

Do you know how other cluster operators manage the GrafanaDashboard when you can have multiple instances even in the same namespace?

Kuadrant / limitador-operator