Open awahab07 opened 4 months ago
Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)
This was investigated by @flash1293.
Problem Elasticsearch can optimize a single terms agg, but can’t for nested terms or composite aggs, so it needs to check every doc which is expensive.
Ideas to explore
Dataset Quality, while fetching degraded docs percentage for data streams uses composite aggregation to produce the following information:
Problem
The query to fetch total documents per data stream per space (see below) is significantly slower on large clusters. It is more than 10 times slower than the ignored documents query (when
_ignored
filter is present in query, see). This is particularly true on clusters which are busy ingesting live logs.Also, the endpoint issues an extra call to ES to fetch last empty page of buckets which can be prevented.
Endpoint
Endpoint:
/internal/dataset_quality/data_streams/degraded_docs
Result:Queries used:
POST /logs-*/_search
_ignored
docs per data stream per space :POST /logs-*/_search
Preview:
https://github.com/elastic/kibana/assets/2748376/e645ce7a-f3f2-4b65-8332-15706f09a408