documentation for usage in combination with security plugins

rursprung commented 2 years ago

background

this is the third attempt in the third repo to get an answer to https://github.com/vvanholl/elasticsearch-prometheus-exporter/issues/324 and https://github.com/aparo/opensearch-prometheus-exporter/issues/4 🙂

there are various security plugins available for OpenSearch (OpenSearch Security and SearchGuard with their upcoming release) and OpenSearch Security is included and enabled by default in the normal distribution. is there any documentation on how to use opensearch-prometheus-exporter in combination with them?

problems

i see two aspects to this:

this plugin does REST calls to elasticsearch, so if security is enabled there it needs to be able to
- provide authentication
- validate & accept the TLS cert (it might well be a self-signed CA at the top which isn't accepted by the prometheus scrapper!)
since the prometheus plugin exposes a new path on the existing 9200 port instead of opening a dedicated port it is behind HTTPS (if this is enabled in the security plugin) and behind authentication. but when using prometheus in a kubernetes environment with automatic scrapping there's no good way to configure the scraper to accept self-signed CAs or provide authentication

our workaround (solution?)

this is how we got it to work:

there's a role for prometheus (both to access the prometheus metrics as well as for the prometheus plugin itself to access the internal metrics; mixed here because as you can see in the role mapping below it's anyway using the same mechanism):

read_prometheus:
  cluster_permissions:
    - "cluster:monitor/prometheus/metrics" # allow access to the prometheus plugin (the prometheus metrics collector doesn't send authentication information)
    # allow the plugin to access the required metrics (the plugin also doesn't send authentication information)
    - "cluster:monitor/health"
    - "cluster:monitor/state"
    - "cluster:monitor/nodes/info"
    - "cluster:monitor/nodes/stats"
  index_permissions:
    - index_patterns:
      - "*"
      allowed_actions:
      - "indices:monitor/stats"

this is then mapped for all anonymous users in a role mapping (which IMHO is bad practice as it requires enabling anonymous auth in the first place):

read_prometheus:
  backend_roles:
    - opendistro_security_anonymous_backendrole

this of course requires config.dynamic.http.anonymous_auth_enabled: true to be set in the security config. note: this is currently undocumented in OpenSearch, i've raised the corresponding docs ticket: https://github.com/opensearch-project/documentation-website/issues/627

if the prometheus scrapper doesn't know the CA used for the TLS certificates on the http port then you might also have to disable TLS on OpenSearch (or re-configure the prometheus scrapper).

with this in place it's then possible for prometheus to scrap the metrics.

alternatives

it would also be possible to set up basic authentication on OpenSearch with a dedicated user for prometheus and then let the prometheus scrapper use this user to read the data. however, this also isn't particularly secure (e.g. when using the k8s annotations for prometheus and having to define the username/password there...) and requires having basic auth enabled in the first place (which isn't what we want given that we have no other usage for it and adding a new auth realm for one single use-case opens a whole new can of worms).

required solution

preferred solution

no config is needed for the plugin to be able to access the metrics, it handles this internally (at least for OpenSearch Security as it should IMHO have first-party integration with that)
no config is needed for the plugin to be accessible from the outside (to avoid both the authentication issues and the certificate issues you could potentially spin up your own HTTP(-only) server rather than relying on the OpenSearch mechanism to provide the endpoint)

minimal solution

the best practices for the (manual) security setup are documented

AndersBennedsgaard commented 1 year ago

We have Prometheus and OpenSearch running in a Kubernetes cluster, and what we do for Prometheus scraping with security enabled, is to:

Create Prometheus user+role with "cluster_permissions": ["cluster:monitor/prometheus/metrics", "cluster:monitor/health", "cluster:monitor/nodes/info", "cluster:monitor/nodes/stats", "cluster:monitor/state"] (we don't use detailed index level metrics to circumvent the high-cardinality issue)
Enable the clientcert authentication backend using the common name as username attribute
Create a certificate with common name that fits the user mentioned above (using Cert-Manager)
Create a ServiceMonitor using the Prometheus operator, which looks like

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: opensearch-metrics
  namespace: opensearch
spec:
  endpoints:
    - path: /_prometheus/metrics
      port: http
      scheme: https
      tlsConfig:
        # Not possible to verify the certificate, since Prometheus targets IP addresses
        # directly instead of using Kubernetes DNS, which aren't included in their certificates
        insecureSkipVerify: true
        # Used for authentication
        ca:
          secret:
            name: opensearch-prometheus-http-crt
            key: ca.crt
        cert:
          secret:
            name: opensearch-prometheus-http-crt
            key: tls.crt
        keySecret:
          name: opensearch-prometheus-http-crt
          key: tls.key
  selector:
    matchLabels:
      app.kubernetes.io/name: opensearch

The important part here is the spec.endpoints, where we specify how Prometheus service-discovery is configured. In the tlsConfig I reference the Secret which contains certificates for encrypting communication towards the HTTP port, where the certificates are used to authenticate with the prometheus user created above.

Unfortunately, I haven't found a solution around the need for using insecureSkipVerify since Prometheus will complain with x509: certificate is valid for 127.0.0.1 ... as the Pod IP address isn't part of the allowed IP addresses in the certificate used for exposing OpenSearch in the cluster, but I am fine with it for now.

What do you think of this solution @rursprung and @lukas-vlcek ?

rursprung commented 1 year ago

thanks for your feedback @AndersBennedsgaard!

that sounds like an interesting approach!

Unfortunately, I haven't found a solution around the need for using insecureSkipVerify since Prometheus will complain with x509: certificate is valid for 127.0.0.1 ... as the Pod IP address isn't part of the allowed IP addresses in the certificate used for exposing OpenSearch in the cluster, but I am fine with it for now.

if you're using the CSI driver from cert-manager to issue certificates then you should be able to include the specific IP of the pod in the certificate (i haven't tried that myself).

AndersBennedsgaard commented 1 year ago

@rursprung I have considered using the CSI driver, but I'm not really comfortable introducing it to production since it does not seem entirely stable yet. But a good suggestion, which we will take in consideration in the future :smiley:

Aiven-Open / prometheus-exporter-plugin-for-opensearch