elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

Metricbeat module beat-xpack leads to incorrect cluster in Stack Monitoring #30652

Open mag-mkorn opened 2 years ago

mag-mkorn commented 2 years ago

When monitoring data is ingested via metricbeat module beat-xpack an additional cluster Standalone Cluster shows up in Stack Monitoring: image

Investigating the data shows that dataset beat.state is missing the field cluster_uuid: image

I suspect, that the missing id leads to the additional cluster in the monitoring view.

To back my thesis, I removed only the beat.state data with delete by query:

POST .monitoring-beats-8-mb/_delete_by_query
{
  "query": {
    "match": {
      "event.dataset": "beat.state"
    }
  }
}

After this delete the faulty cluster was gone. IMO the cluster_uuid should be provided by dataset beat.state.

botelastic[bot] commented 2 years ago

Thank you very much for creating this issue. However, we would kindly like to ask you to post all questions and issues on the Discuss forum first. In addition to awesome, knowledgeable community contributors, core Beats developers are on the forums every single day to help you out as well. So, your questions will reach a wider audience there, and if we confirm that there is a bug, then you can reopen this issue with the new information or open a new one.

elasticmachine commented 2 years ago

Pinging @elastic/integrations (Team:Integrations)

piotrp commented 2 years ago

This happens when Metricbeat is monitoring Filebeat - correct cluster_uuid is not written, and the only workaround it to write it into Filebeat configuration (monitoring.cluster_uuid).

A similar thing happens with Logstash (logstash-xpack), but there Metricbeat seems to partially apply its cluster_id:

Tested on 7.17.5 (of all involved applications).

Making cluster_uuid being autodetected by Metricbeat and applied consistently to gathered metrics would make deploying Metricbeat much simpler. Currently the recommended solution is cumbersome to use, and requires much more work than the legacy one, which worked out of the box.

piotrp commented 2 years ago

There are no changes in 8.3.2

rpasche commented 2 years ago

@piotrp Thank you for these hints. I just wanted to file an ES case, because I also just found out, that some of my pipelines are not showing up anymore within stack monitoring. Your "hint" with that "Standalone Cluster" now opened my eyes here. Just wondered, why that standalone cluster does have ~800 logstash instances (running 8.2.3). Still checking...

rpasche commented 2 years ago

@piotrp You are saying

  • only the first of my three pipelines is visible in Kibana under correct cluster
  • all of my three pipelines are visible under Standalone Cluster

Can you confirm, that those pipelines you correctly see in your cluster are truly somewhere using the elasticsearch.output plugin? And, on the other hand, all pipelines that seem to be assigned to the standalone cluster are ones that are not anywhere using the elasticsearch.output plugin?

I started to trace this issue also in our environment, and while looking in the logstash module of metricbeat, I found the API call _node/pipelines?graph=true. Metricbeat then seems to try to extract the cluster_uuid from the structure monitoring.cluster_uuid (if found) or uses the cluster_uuid found from within an vertex (see https://github.com/elastic/beats/blob/main/metricbeat/module/logstash/logstash.go#L157). But I just noticed, that only our pipelines, that use the elasticsearch.output plugin (of logstash) contain such a cluster_uuid (within the vertex). So all our pipelines, that do not use the elasticsearch.output plugin, do not have a cluster_uuid. And this then causes the events to be "assigned" to the standalone cluster.

At least, just looking in our stack monitoring and the pipeline names found there within the standalone cluster shows, that this "seems" to fit. But I also need to verify this.

I am also pretty sure, that we were seeing in our stack monitoring also all other pipelines (assigned to its correct clusters) while we were still running version 7.17.1 (we ran that until ~2 weeks ago). I will test to setup the monitoring.cluster_uuid within the logstash.yml to see, if our "missing" pipelines then show up within the correct cluster. (https://www.elastic.co/guide/en/logstash/current/monitoring-with-metricbeat.html#define-cluster__uuid)

rpasche commented 2 years ago

And if I read this correctly (https://github.com/elastic/logstash/blob/55e7a2641616b6c94e1390a6a921b54650ecd5a3/x-pack/lib/monitoring/inputs/metrics.rb#L227), when a cluster_uuid is found (because elasticsearch.output plugin is used, that is used. If cluster_uuid is missing and monitoring.cluster_uuid is set within the config, this config setting will be used. And also I expect to see some warnings messages @logger.info("Found cluster_uuids from elasticsearch output plugins", :cluster_uuids => cluster_uuids)

piotrp commented 2 years ago

I don't have time to test this now, but only one of my three pipelines is using the elasticsearch output, the remaining two use a custom one.

rpasche commented 2 years ago

@piotrp Setting the monitoring.cluster_uuid within logstash.yml seems to fix the issue. I have tested this now in one example deployment in one of our clusters and the "missing" pipelines (ones without any elasticsearch.output plugin usage) are now back visible within that cluster, Logstash is running.

But still....we didn't use that setting before in our environment running 7.17.1 and I am very sure, that we also saw those pipelines all the times.

rpasche commented 2 years ago

So....at least seems partially fixed. I still see a lot of events within the .monitoring-logstash-8-mb.. indices that do not have a cluster_uuid set an thus, are assigned to the standalone cluster. But most of them seems to be node_stats metrics, not node

mag-mkorn commented 1 year ago

This happens when Metricbeat is monitoring Filebeat - correct cluster_uuid is not written, and the only workaround it to write it into Filebeat configuration (monitoring.cluster_uuid).

A similar thing happens with Logstash (logstash-xpack), but there Metricbeat seems to partially apply its cluster_id:

* only the first of my three pipelines is visible in Kibana under correct cluster

* all of my three pipelines are visible under Standalone Cluster

Tested on 7.17.5 (of all involved applications).

Making cluster_uuid being autodetected by Metricbeat and applied consistently to gathered metrics would make deploying Metricbeat much simpler. Currently the recommended solution is cumbersome to use, and requires much more work than the legacy one, which worked out of the box.

monitoring.cluster_uuid is set on both filebeat and metricbeat. Without that none of the events would be associated with the cluster. Exactly this is the problem. Although the uuid is configured some of the filebeat monitoring metrics do not contain the uuid in the events.

As of version 8.4.1 the problem still exists.

mag-mkorn commented 1 year ago

Still the same as of 8.6.2.

botelastic[bot] commented 7 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

mag-mkorn commented 7 months ago

Still relevant.