elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.35k stars 7.98k forks source link

Fix the telemetry collection of Logstash with metricbeat monitoring. #182304

Closed mashhurs closed 1 week ago

mashhurs commented 2 weeks ago

Summary

Telemetry data collection is broken for Logstash, monitoring with metricbeat. This PR change covers following issues faced:

1) Resolve cluster UUID

2) type field mismatch in (especially in state) queries, also collapse field

3) Logstash state data frequency

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

apmmachine commented 2 weeks ago

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with: - `/oblt-deploy` : Deploy a Kibana instance using the Observability test environments. - `run` `docs-build` : Re-trigger the docs validation. (use unformatted text in the comment!)

mashhurs commented 2 weeks ago

FYI: I have updated the unit test cases which align with current changes, wil try to add for metricbeat.

afharo commented 1 week ago

Hmm, the failing tests indicate that we're somehow not returning the monitoring data... Maybe the new query is failing?

UPDATE: Found it in the logs:

[00:00:06]           │ proc [kibana] [2024-05-08T00:55:24.693+00:00][WARN ][plugins.usageCollection.usage-collection.collector-set] ResponseError: search_phase_execution_exception
[00:00:06]           │ proc [kibana]  Caused by:
[00:00:06]           │ proc [kibana]      illegal_argument_exception: no mapping found for `logstash.node.stats.logstash.uuid` in order to collapse on
[00:00:06]           │ proc [kibana]  Root causes:
[00:00:06]           │ proc [kibana]      illegal_argument_exception: no mapping found for `logstash.node.stats.logstash.uuid` in order to collapse on
[00:00:06]           │ proc [kibana]     at KibanaTransport.request (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@elastic/transport/lib/Transport.js:492:27)
[00:00:06]           │ proc [kibana]     at processTicksAndRejections (node:internal/process/task_queues:95:5)
[00:00:06]           │ proc [kibana]     at KibanaTransport.request (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:51:16)
[00:00:06]           │ proc [kibana]     at ClientTraced.SearchApi [as search] (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@elastic/elasticsearch/lib/api/api/search.js:66:12)
[00:00:06]           │ proc [kibana]     at fetchLogstashStats (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/get_logstash_stats.js:225:19)
[00:00:06]           │ proc [kibana]     at getLogstashStats (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/get_logstash_stats.js:312:5)
[00:00:06]           │ proc [kibana]     at async Promise.all (index 2)
[00:00:06]           │ proc [kibana]     at getAllStats (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/get_all_stats.js:34:49)
[00:00:06]           │ proc [kibana]     at async Promise.all (index 1)
[00:00:06]           │ proc [kibana]     at Collector.fetch (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/register_monitoring_telemetry_collection.js:227:33)
[00:00:06]           │ proc [kibana]     at CollectorSet.fetchCollector (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/usage-collection-plugin/server/collector/collector_set.js:141:24)
[00:00:06]           │ proc [kibana]     at fetch_monitoringTelemetry (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/usage-collection-plugin/server/collector/collector_set.js:175:103) {"service":{"node":{"roles":["background_tasks","ui"]}}}
mashhurs commented 1 week ago

@elasticmachine merge upstream

mashhurs commented 1 week ago

LGTM! This is great! Thanks for such an effort!

Thank you so much @afharo. This happened because of your huge help, appreciate!

kibana-ci commented 1 week ago

:green_heart: Build Succeeded

Metrics [docs]

Unknown metric groups #### ESLint disabled line counts | id | [before](https://github.com/elastic/kibana/commit/bc103c7016245901a04fc4921c1b213a4fbe2695) | [after](https://github.com/elastic/kibana/commit/37f39121157efaad3c6d2d25e3a18e033d4ca99c) | diff | | --- | --- | --- | --- | | `monitoring` | 18 | 20 | +2 | #### Total ESLint disabled count | id | [before](https://github.com/elastic/kibana/commit/bc103c7016245901a04fc4921c1b213a4fbe2695) | [after](https://github.com/elastic/kibana/commit/37f39121157efaad3c6d2d25e3a18e033d4ca99c) | diff | | --- | --- | --- | --- | | `monitoring` | 25 | 27 | +2 |

History

To update your PR or re-run it, just comment with: @elasticmachine merge upstream

cc @mashhurs

mashhurs commented 6 days ago

@afharo, @neptunian can we please backport this change to upcoming 8.14.x releases?

afharo commented 5 days ago

I've added the appropriate label to back port this PR to the previous minor.

Did the same with https://github.com/elastic/kibana/pull/182857

Hopefully, our kibanamachine bot backports them for us.

kibanamachine commented 5 days ago

💚 All backports created successfully

Status Branch Result
8.14

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

miltonhultgren commented 5 days ago

@afharo Thank you so much for sharing all your knowledge here and getting this to done!