elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
19 stars 144 forks source link

Prometheus Input - auto discovery and leader election in fleet #4126

Open Alphayeeeet opened 9 months ago

Alphayeeeet commented 9 months ago

Describe the enhancement: When deploying Elastic Agent as a DeamonSet into Kubernetes, I also want to scrape metrics from custom workload. By defining labels or annotations, it can be achieved to auto discover prometheus endpoints in standalone mode. Fleet with its integrations for Prometheus and the new Prometheus Input should have the same possibilites.

As well if auto discover or any metric scraping in general is enabled, each agent in the deamon set tries to scrape those metrics. In that case, the data gets ingested multiple times. This should be avoided by using the kubernetes leader election, or even better, by load balancing the scraping tasks per distinct endpoint to a distinct agent. In that case, the load would be distributed more even.

Describe a specific use case for the enhancement or feature: Cloud Native Monitoring either requires you, to distribute a Prometheus server instance and use remote write to a distinct elastic agent. Elastic Agent as it provides the scraping possibility, should also provide the functionality to directly scrape metrics from distributed load. Configurations of the endpoints can be achieved by using auto discover and provide the necessary endpoint information via labels/annotations.

What is the definition of done? The Prometheus/Input integration should be extended to support the above mentioned necessities. In the Fleet UI, there should be available configurations for variables to read from labels/annotations. Also a condition needs to be set (e.g. a label which signals, that this pod has exposed metrics).

Also there should be a possibility to avoid duplicate data, when using this integration in a distributed environment. Can be used if a condition is set to get the leader status of kubernetes integration?

Please explain, what should be inserted in conditions to validate this change, and I will add those.

Alphayeeeet commented 8 months ago

Leader election is already possible in the Prometheus integration. It should be also configurable for the technical preview Prometheus Input.

Also rerouting in different datasets and namespaces should also be possible based on Kubernetes annotations, like it is possible for Kubernetes container logs.

Alphayeeeet commented 4 months ago

@cmacknz Any updates by now?

pierrehilbert commented 4 months ago

Hello @Alphayeeeet, Sorry for the delay here. @gizas Would you have time to have a look here? cc @bturquet

gizas commented 4 months ago

Thanks @Alphayeeeet , let me try to kick off the discussion with some basic information:

  1. This is the way hints-annotations autodiscovery to be configured in elastic-agent standalone
  2. And this is the specific template for prometheus integration that needs to be either mounted or added in the inputs config of agent
  3. Then by using co.elastic.hints/package: prometheus you will be in place to use the autodiscovery for pods, the pods that have the specific annotations configured.

NOTE: The hints autodiscovery is based on annotations

For the second thing you mention the leader election, indeed there is an election mechanism for the collector type only. This is the condition: ${kubernetes_leaderelection.leader} == true

The remote write configuration now ,does not support leader election. But you can always set conditons like condition: ${kubernetes.container.name} == 'prometheus-server' to specify where to find the prometheus container /application

Note: Conditions based autodiscovery will have bigger priority in case of hints autodiscovery Same conditions can be used of course in the prometheus with collector configuration.

For remote write the logic is different(to push data from server) that is why the notion of leader does not have sense. But if you wish to scale elastic-agents with remote_write configured, we can have a kubernetes service where multiple agents can leave behind it and prometheus remote write to send to the k8s service. Let me know if you are also interested in this scenario and I can provide the info as well. I dont think is so relevant at the moment

Please let me know if the above are ok and match the details of your setup.

Alphayeeeet commented 4 months ago

Hi @gizas,

Thank you for the update. Unfortunately those docs are only for standalone elastic-agents. I need the same configuration for fleet-managed elastic agents. However even if autodiscovery would be possible, we still have the reouting issue. We want to change the namespace of the metrics depending on the pod annotations like it is possible wth kubernetes.container_logs.

I hope its a bit clearer now.

Alphayeeeet commented 4 months ago

Also another topic is adding the kubernetes.* fields. I have some issues while trying to use the add_kubernetes_metadata processor. It doesnt add any metadata at all, while i have it configured as documented here: https://www.elastic.co/guide/en/beats/metricbeat/current/add-kubernetes-metadata.html

I need the metadata fields as they're added by the Kubernetes integration. I believe that's what the processor should do.

Alphayeeeet commented 4 months ago

Actually my desired scenario would be hints based autodiscovery in fleet managed elastic agents for logs like in #5015 and for metrics too. There should be the possibility to configure which integration/metricset is used to parse those metrics/logs afterwards. Also there should be the possibility for the generic ones (custom logs or prometheus collector) by configuring dataset and providing own ingest pipelines. If you understand what i mean, I would suggest to open a new ticket on this topic.

gizas commented 4 months ago

Thanks for the clarifications @Alphayeeeet . The hints based autodiscovery is not supported in Fleet Agents. (This is an old feature here which we decided to put it currently on hold).

We want to change the namespace of the metrics depending on the pod annotations like it is possible wth kubernetes.container_logs.

The conditions autodiscovery on the other is working for both managed and standalone. So what you want to achieve to have different namespace like kubernetes.container_logs, is to change the dataset.name like in the picture right?

Screenshot 2024-07-09 at 10 45 22 AM

If yes, the change of dataset.name is supported in prometheus since v1.3.0 (and this PR)

Also another topic is adding the kubernetes.* fields. I have some issues while trying to use the add_kubernetes_metadata processor. It doesnt add any metadata at all

In your case you should make use of the kubernetes provider to achieve the metadata enrichemnt:

Sample of Elastic agent config ```yaml elastic-agent.yml: |- ...truncated... agent: .....truncated... providers: kubernetes: node: ${NODE_NAME} scope: node add_resource_metadata: ..truncated... deployment: true inputs: - id: prometheus/metrics-prometheus-${kubernetes.pod.name}-${kubernetes.container.id} type: prometheus/metrics processors: - add_fields: target: orchestrator fields: cluster.name: {{ .Values.global.cluster_name }} cluster.role: {{ .Values.global.cluster_role }} platform.type: {{ .Values.global.platform_type }} source.agent: elastic-agent use_output: metrics meta: package: name: prometheus version: 1.1.0 streams: - id: prometheus-metrics-prometheus-${kubernetes.pod.name}-${kubernetes.container.id} condition: ${kubernetes.labels.co.elastic.hints/package} == "prometheus" data_stream: dataset: prometheus.collector namespace: ${kubernetes.labels.app.kubernetes.io/name|'default'} type: metrics hosts: - ${kubernetes.labels.co.elastic.hints/protocol|'http'}://${kubernetes.pod.ip}:${kubernetes.labels.co.elastic.hints/port|'8080'} metrics_filters.exclude: null metrics_filters.include: null metrics_path: /metrics metricsets: - collector period: ${kubernetes.labels.co.elastic.hints/period|'10s'} ssl.verification_mode: ${kubernetes.labels.co.elastic.hints/sslVerificationMode|'full'} rate_counters: true use_types: true - id: prometheus-metrics-prometheus-${kubernetes.pod.name}-${kubernetes.container.id} condition: ${kubernetes.annotations.co.elastic.hints/package} == "prometheus" data_stream: dataset: prometheus.collector namespace: ${kubernetes.labels.app.kubernetes.io/name|'default'} type: metrics hosts: - ${kubernetes.annotations.co.elastic.hints/protocol|'http'}://${kubernetes.pod.ip}:${kubernetes.annotations.co.elastic.hints/port|'8080'} metrics_filters.exclude: null metrics_filters.include: null metrics_path: /metrics metricsets: - collector period: ${kubernetes.annotations.co.elastic.hints/period|'10s'} ssl.verification_mode: ${kubernetes.annotations.co.elastic.hints/sslVerificationMode|'full'} rate_counters: true use_types: true ```

In the example above you can see that the namespace is configured based on the label : namespace: ${kubernetes.labels.app.kubernetes.io/name|'default'} You need to add a kubernetes. variable in the prometheus config to make the kubernetes provider able to kick the match and apply the autodiscovery

there should be the possibility to configure which integration/metricset is used to parse those metrics/logs afterwards.

For me it sounds like you are trying to achieve both logs and metrics ingestion. So you can always install 2 integrations. The one to be kubernetes integration only with logs enabled and the second to be the prometheus one and use it to collect the metrics you want

Alphayeeeet commented 4 months ago

Thank you @gizas I will try and will raise any upcoming issues here in the conversation.

Alphayeeeet commented 4 months ago

@gizas I would suggest moving the discussion and error part into the forum: https://discuss.elastic.co/t/fleet-managed-elastic-agent-kubernetes-prometheus-metrics-autodiscover/362808

Alphayeeeet commented 4 months ago

@gizas Could you please provide me some assistance. I replied to the discussion, with any errors, I received after configuring the Prometheus Integration Policy.

Alphayeeeet commented 3 months ago

@gizas As an update: I finally figured out, how to configure the Prometheus integration in my Kubernetes environment. However for the rerouting feature, special API-key permissions are needed, for which I opened a PR: https://github.com/elastic/integrations/pull/10592