bottlerocket-os / bottlerocket-update-operator

A Kubernetes operator for automated updates to Bottlerocket
Other
172 stars 41 forks source link

Brupop is Not Sending Metrics to Datadog #644

Open Gaurav2586 opened 1 month ago

Gaurav2586 commented 1 month ago

Despite making the necessary changes to expose OpenMetrics from the brupop-controller and configuring the Datadog Agent to scrape these metrics, we are not seeing any metrics data in Datadog.

Adding annotation is not possible with provided brupop helm chart, I added the annotation to my container manually [FYI]

My files looks like -


apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: brupop-controller
  namespace: {{ .Values.namespace }}
  labels:
    app.kubernetes.io/component: brupop-controller
    app.kubernetes.io/managed-by: brupop
    app.kubernetes.io/part-of: brupop
    brupop.bottlerocket.aws/component: brupop-controller
spec:
  endpoints:
  - port: http-metrics
    path: /metrics
  namespaceSelector:
    matchNames:
      - {{ .Values.namespace }}
  selector:
    matchLabels:
      brupop.bottlerocket.aws/component: brupop-controller

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "8080"
    prometheus.io/scrape: "true"
  labels:
    app.kubernetes.io/component: brupop-controller
    app.kubernetes.io/managed-by: brupop
    app.kubernetes.io/part-of: brupop
    brupop.bottlerocket.aws/component: brupop-controller
  name: brupop-controller-server
  namespace: {{ .Values.namespace }}
spec:
  ports:
    - name: http-metrics
      port: 8080
      targetPort: http-metrics
      protocol: TCP
  selector:
    brupop.bottlerocket.aws/component: brupop-controller
Deployment file changes - controller-deployment.yaml [Added the below config]
 ports:
            - name: http-metrics
              containerPort: 8080
              protocol: TCP
Added Annotation in container related to DD agent like below -

- name: brupop-operator  # Full name bottlerocket-update-operator
    namespace: brupop-bottlerocket-aws
    version: "1.1.0"
    chartName: bottlerocket-update-operator
    values:
      podAnnotations:
        ad.datadoghq.com/controller.checks: |
          {
            "brupop": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8080/metrics"
                }
              ]
            }
          }

Please let me know if I am missing anything on my end or sending metrics is not possible via open metrics to datadog ?

cbgbt commented 1 month ago

Hello. Can you share which version of Brupop you are attempting to configure?

The ability to create a ServiceMonitor via helm values was recently released in Brupop 1.4.0

You can see the relevant helm values here

Gaurav2586 commented 1 month ago

I am using version 1.3.0, but I have installed ServiceMonitor already, is it only work with version 1.4.0 ? and my open metrics string looks like this -

openmetrics_endpoint": "http://%%host%%:8080/metrics

An annotation is missing in the controller template, can I create a PR for the same? That is used to send metrics using openmetrics_endpoint in DataDog

cbgbt commented 1 month ago

It should work with 1.3.0. I'll take a closer look to see if I can get a similar setup.

I've noticed a section in the DataDog documentation mentioning autodiscovery via Prometheus annotations. Is that something that could possibly work for you?

An annotation is missing in the controller template, can I create a PR for the same?

I think rather than adding additional annotations here, we probably instead want to expose the annotations field wholesale as we've done for e.g. placement settings. For compatibility, we'd want the default to maintain the existing prometheus annotations.

A PR for that would be welcome. I cut #646 as a separate issue for that, as I think it's a worthwhile feature even if you can resolve your issue using the DataDog autodiscovery feature.

Gaurav2586 commented 1 month ago

Metrics are coming to the data dog and working fine. Let me know if I can add the datadog integration block in "README", So that people get benefit from it.

I have one general question. I deployed Brupop in my EKS cluster with three environments: Dev, Test, and Prod. What I observed is that as soon as a new BR image is available in the AWS public repository, my Brupop operator starts upgrading the instances. Is it possible to hold these changes to be rollout in Prod for the time being to ensure safety? This means We would like to patch Dev first, evaluate it, and then deploy it to Prod. Any process or idea around this.,

cbgbt commented 1 month ago

Metrics are coming to the data dog and working fine. Let me know if I can add the datadog integration block in "README", So that people get benefit from it.

That could be useful, although the README is pretty expansive. Mind sharing the configuration here? We could discuss adding it to the README, but it might also help if this thread turns up in web searches.

It would also be helpful to inform the design for #646.

This means We would like to patch Dev first, evaluate it, and then deploy it to Prod. Any process or idea around this. It would require some work on your side, but it's helpful to know that Brupop always respects the Bottlerocket settings that are applied to the host being updated.

What this means is that you can apply a version lock to all of your instances. Brupop will ensure that these instances remain at the locked version. You would then need to instrument a system to update settings.updates.version-lock to the desired version in each environment.

If you implement this, you may wish to also take a look at the ignore waves setting, which impacts when your host is able to see an update.