Open lukas-vlcek opened 7 months ago
Is there any progress in this task. I would like to use prometheus to scrape opensearch metrics and use Grafana dashboards to monitor
This tutorial is very much needed, I've been though several attempts to get Prometheus to scrape an endpoint on Kubernetes with no success
Just for the record the following is a Slack thread we had with @smbambling on this topic: https://opensearch.slack.com/archives/C051JEH8MNU/p1715262647976709
I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics
via two seperate methods.
Notes:
Method 1: Static Prometheus configs
In this method I've modified the kube-prometheus-stack Helm value override in order to apply additional configs.
In the below values I've tested multiple different combintations of configs
insecure_skip_verify: true
no other tls_configs setinsecure_skip_verify: false
with ca_file
setmax_version: TLS12
both set and not setcert_file
+ key_file
both set and not setprometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: opensearch-job
metrics_path: /_prometheus/metrics
scheme: https
static_configs:
- targets:
- opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200
basic_auth:
username: "admin"
password: "myfakePW"
tls_config:
insecure_skip_verify: true
max_version: TLS12
ca_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/ca.crt
cert_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.crt
key_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.key
From another pod within the monitoring
namespace where Prometheus ( no curl installed in the Prom container ) is running. I'm able to curl the internal service DNS name set above.
--- with referencing the CA cert
$ curl -XGET --cacert /tmp/foo -u 'admin:myfakePW' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP opensearch_jvm_mem_pool_max_bytes Maximum usage of memory pool
# TYPE opensearch_jvm_mem_pool_max_bytes gauge
opensearch_jvm_mem_pool_max_bytes{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",pool="survivor",} 0.0
AND
--- without referencing the CA cert
$ curl -k -u 'admin:tes+1Passw*rd2' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP opensearch_indices_get_count Count of get commands
# TYPE opensearch_indices_get_count gauge
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 48.0
I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics
via two seperate methods.
Notes:
Method 2: Using Prometheus Service Monitor
In this method I've created a servicemonitor for kube-prometheus-stack to read and generate scrape targets.
Below is the output for my created servicemonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
meta.helm.sh/release-name: opensearch-master
meta.helm.sh/release-namespace: opensearch
creationTimestamp: "2024-05-08T14:51:02Z"
generation: 12
labels:
app.kubernetes.io/component: opensearch-localk3s-cl1-master
app.kubernetes.io/instance: opensearch-master
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opensearch
app.kubernetes.io/version: 2.11.1
helm.sh/chart: opensearch-2.17.0
release: kube-prometheus-stack
name: opensearch-service-monitor
namespace: monitoring
resourceVersion: "141672"
uid: cf1df5d5-a855-4eb1-8cb5-da2ddaad99f6
spec:
endpoints:
- basicAuth:
password:
key: password
name: opensearch-service-monitor-basic-auth
username:
key: username
name: opensearch-service-monitor-basic-auth
interval: 10s
path: /_prometheus/metrics
port: http
scheme: https
tlsConfig:
ca: {}
insecureSkipVerify: true
namespaceSelector:
matchNames:
- opensearch
selector:
matchLabels:
app.kubernetes.io/component: opensearch-localk3s-cl1-master
app.kubernetes.io/instance: opensearch-master
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opensearch
app.kubernetes.io/version: 2.11.1
helm.sh/chart: opensearch-2.17.0
Again multiple different combintations of configs were tested within the servicemonitor which proivded the same end result. Where the scrape endpoints are created but there is an SSL handshake issue for Prometheus
Just as verification I could also curl from the same pod in method 1 to the cluster IP endpoints generated via the servicemonitor
$ curl -u 'admin:myfakePW' -k https://10.42.0.69:9200/_prometheus/metrics | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP opensearch_indices_refresh_total_time_seconds Time spent while refreshes
# TYPE opensearch_indices_refresh_total_time_seconds gauge
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 174.781
In the end both methods produce the following errors in the Prometheus UI
Thanks @smbambling for putting the effort into write it all down.
In our testing setup we had limiting ciphers in plugins.security.ssl.transport.enabled_ciphers
, commenting this out allowed Prometheus to scrape the endpoints and gather data.
i want to ask something, does this meas the opensearch provide the metrics data to prome? or prome provide the metrics data to opensearch?
@rarifz This installs an exporter that exposes metrics about OpenSearch that Prometheus can be configured to scrape
There is a lack of complete tutorial about how to setup OpenSearch cluster with the plugin in K8s and have Prometheus craping the metric endpoint.
See: https://forum.opensearch.org/t/prometheus-not-able-to-scrape-metrics-on-pod/16908/
Idea: This setup flow should be part of plugin new release process or even the CI (?)