Aiven-Open / prometheus-exporter-plugin-for-opensearch

Prometheus exporter plugin for OpenSearch & OpenSearch Mixin
Apache License 2.0
110 stars 34 forks source link

[Tutorial] Write complete tutorial on how to setup OpenSearch with the plugin in K8s and Prometheus craping it #240

Open lukas-vlcek opened 7 months ago

lukas-vlcek commented 7 months ago

There is a lack of complete tutorial about how to setup OpenSearch cluster with the plugin in K8s and have Prometheus craping the metric endpoint.

See: https://forum.opensearch.org/t/prometheus-not-able-to-scrape-metrics-on-pod/16908/

Idea: This setup flow should be part of plugin new release process or even the CI (?)

layavadi commented 2 months ago

Is there any progress in this task. I would like to use prometheus to scrape opensearch metrics and use Grafana dashboards to monitor

smbambling commented 1 month ago

This tutorial is very much needed, I've been though several attempts to get Prometheus to scrape an endpoint on Kubernetes with no success

lukas-vlcek commented 1 month ago

Just for the record the following is a Slack thread we had with @smbambling on this topic: https://opensearch.slack.com/archives/C051JEH8MNU/p1715262647976709

smbambling commented 1 month ago

I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics via two seperate methods.

Notes:

Method 1: Static Prometheus configs

In this method I've modified the kube-prometheus-stack Helm value override in order to apply additional configs.

In the below values I've tested multiple different combintations of configs

prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: opensearch-job
        metrics_path: /_prometheus/metrics
        scheme: https
        static_configs:
          - targets:
              - opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200
        basic_auth:
          username: "admin"
          password: "myfakePW"
        tls_config:
          insecure_skip_verify: true
          max_version: TLS12
          ca_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/ca.crt
          cert_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.crt
          key_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.key

From another pod within the monitoring namespace where Prometheus ( no curl installed in the Prom container ) is running. I'm able to curl the internal service DNS name set above.

--- with referencing the CA cert
$ curl -XGET --cacert /tmp/foo -u 'admin:myfakePW' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP opensearch_jvm_mem_pool_max_bytes Maximum usage of memory pool
# TYPE opensearch_jvm_mem_pool_max_bytes gauge
opensearch_jvm_mem_pool_max_bytes{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",pool="survivor",} 0.0

AND

--- without referencing the CA cert
$ curl -k -u 'admin:tes+1Passw*rd2' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP opensearch_indices_get_count Count of get commands
# TYPE opensearch_indices_get_count gauge
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 48.0
smbambling commented 1 month ago

I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics via two seperate methods.

Notes:

Method 2: Using Prometheus Service Monitor

In this method I've created a servicemonitor for kube-prometheus-stack to read and generate scrape targets.

Below is the output for my created servicemonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: opensearch-master
    meta.helm.sh/release-namespace: opensearch
  creationTimestamp: "2024-05-08T14:51:02Z"
  generation: 12
  labels:
    app.kubernetes.io/component: opensearch-localk3s-cl1-master
    app.kubernetes.io/instance: opensearch-master
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opensearch
    app.kubernetes.io/version: 2.11.1
    helm.sh/chart: opensearch-2.17.0
    release: kube-prometheus-stack
  name: opensearch-service-monitor
  namespace: monitoring
  resourceVersion: "141672"
  uid: cf1df5d5-a855-4eb1-8cb5-da2ddaad99f6
spec:
  endpoints:
  - basicAuth:
      password:
        key: password
        name: opensearch-service-monitor-basic-auth
      username:
        key: username
        name: opensearch-service-monitor-basic-auth
    interval: 10s
    path: /_prometheus/metrics
    port: http
    scheme: https
    tlsConfig:
      ca: {}
      insecureSkipVerify: true
  namespaceSelector:
    matchNames:
    - opensearch
  selector:
    matchLabels:
      app.kubernetes.io/component: opensearch-localk3s-cl1-master
      app.kubernetes.io/instance: opensearch-master
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: opensearch
      app.kubernetes.io/version: 2.11.1
      helm.sh/chart: opensearch-2.17.0

Again multiple different combintations of configs were tested within the servicemonitor which proivded the same end result. Where the scrape endpoints are created but there is an SSL handshake issue for Prometheus

Just as verification I could also curl from the same pod in method 1 to the cluster IP endpoints generated via the servicemonitor

$ curl -u 'admin:myfakePW' -k https://10.42.0.69:9200/_prometheus/metrics | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP opensearch_indices_refresh_total_time_seconds Time spent while refreshes
# TYPE opensearch_indices_refresh_total_time_seconds gauge
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 174.781

In the end both methods produce the following errors in the Prometheus UI

Screenshot 2024-05-09 at 10 13 11 AM

 

lukas-vlcek commented 1 month ago

Thanks @smbambling for putting the effort into write it all down.

smbambling commented 1 month ago

In our testing setup we had limiting ciphers in plugins.security.ssl.transport.enabled_ciphers, commenting this out allowed Prometheus to scrape the endpoints and gather data.

rarifz commented 4 weeks ago

i want to ask something, does this meas the opensearch provide the metrics data to prome? or prome provide the metrics data to opensearch?

smbambling commented 1 week ago

@rarifz This installs an exporter that exposes metrics about OpenSearch that Prometheus can be configured to scrape