Closed mashhurs closed 1 week ago
Just comment with: - `/oblt-deploy` : Deploy a Kibana instance using the Observability test environments. - `run` `docs-build` : Re-trigger the docs validation. (use unformatted text in the comment!)
FYI: I have updated the unit test cases which align with current changes, wil try to add for metricbeat.
Hmm, the failing tests indicate that we're somehow not returning the monitoring data... Maybe the new query is failing?
UPDATE: Found it in the logs:
[00:00:06] │ proc [kibana] [2024-05-08T00:55:24.693+00:00][WARN ][plugins.usageCollection.usage-collection.collector-set] ResponseError: search_phase_execution_exception
[00:00:06] │ proc [kibana] Caused by:
[00:00:06] │ proc [kibana] illegal_argument_exception: no mapping found for `logstash.node.stats.logstash.uuid` in order to collapse on
[00:00:06] │ proc [kibana] Root causes:
[00:00:06] │ proc [kibana] illegal_argument_exception: no mapping found for `logstash.node.stats.logstash.uuid` in order to collapse on
[00:00:06] │ proc [kibana] at KibanaTransport.request (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@elastic/transport/lib/Transport.js:492:27)
[00:00:06] │ proc [kibana] at processTicksAndRejections (node:internal/process/task_queues:95:5)
[00:00:06] │ proc [kibana] at KibanaTransport.request (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:51:16)
[00:00:06] │ proc [kibana] at ClientTraced.SearchApi [as search] (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@elastic/elasticsearch/lib/api/api/search.js:66:12)
[00:00:06] │ proc [kibana] at fetchLogstashStats (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/get_logstash_stats.js:225:19)
[00:00:06] │ proc [kibana] at getLogstashStats (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/get_logstash_stats.js:312:5)
[00:00:06] │ proc [kibana] at async Promise.all (index 2)
[00:00:06] │ proc [kibana] at getAllStats (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/get_all_stats.js:34:49)
[00:00:06] │ proc [kibana] at async Promise.all (index 1)
[00:00:06] │ proc [kibana] at Collector.fetch (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/monitoring-plugin/server/telemetry_collection/register_monitoring_telemetry_collection.js:227:33)
[00:00:06] │ proc [kibana] at CollectorSet.fetchCollector (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/usage-collection-plugin/server/collector/collector_set.js:141:24)
[00:00:06] │ proc [kibana] at fetch_monitoringTelemetry (/var/lib/buildkite-agent/builds/kb-n2-4-spot-a9d9a28162911021/elastic/kibana-pull-request/kibana-build-xpack/node_modules/@kbn/usage-collection-plugin/server/collector/collector_set.js:175:103) {"service":{"node":{"roles":["background_tasks","ui"]}}}
@elasticmachine merge upstream
LGTM! This is great! Thanks for such an effort!
Thank you so much @afharo. This happened because of your huge help, appreciate!
To update your PR or re-run it, just comment with:
@elasticmachine merge upstream
cc @mashhurs
@afharo, @neptunian can we please backport this change to upcoming 8.14.x releases?
I've added the appropriate label to back port this PR to the previous minor.
Did the same with https://github.com/elastic/kibana/pull/182857
Hopefully, our kibanamachine bot backports them for us.
Status | Branch | Result |
---|---|---|
✅ | 8.14 |
Note: Successful backport PRs will be merged automatically after passing CI.
Please refer to the Backport tool documentation
@afharo Thank you so much for sharing all your knowledge here and getting this to done!
Summary
Telemetry data collection is broken for Logstash, monitoring with metricbeat. This PR change covers following issues faced:
1) Resolve cluster UUID
.monitoring-es*
index with mappingtype
field and defaults totype:cluster_state
key-value. It usestype:cluster_state
condition when fetching cluster UUID.type
field doesn't exist under mapping which metricbeat creates, so cluster UUID will not be resolved as query is wrong (results empty output).2)
type
field mismatch in (especially in state) queries, also collapse fieldmetricset.name:node
and state fetch query doesn't care about this condition, instead uses legacytype:logstash_state
condition which is incorrect.collapse
field is not correct: it is due to data shape change from legacy to metricbeat monitoring and queries are tightly coupled with legacy one (1, 2, 3)filter_path
is also not correct: in both state query and stats query3) Logstash state data frequency
Checklist
Delete any items that are not applicable to this PR.
[ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support[ ] Documentation was added for features that require explanation or tutorials[ ] Flaky Test Runner was used on any tests changed[ ] Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)[ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)[ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list[ ] This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)[ ] This was checked for cross-browser compatibilityRisk Matrix
Delete this section if it is not applicable to this PR.
Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.
When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:
For maintainers