Closed quux00 closed 1 month ago
What information do we want in the cluster/details when a user specifies a remote cluster where there is no matching index?
Example: query is:
FROM logs-*,cluster-a:no_such_index | STATS sum (v)
Do we want only the local cluster present in the response or do we want the remote "cluster-a" also listed, like so:
"_clusters": {
"total": 2,
"successful": 2,
"running": 0,
"partial": 0,
"failed": 0,
"details": {
"(local)": {
"status": "successful",
"indices": "logs-1",
"took": 373,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
},
"cluster-a": {
"status": "successful",
"indices": "cluster-a:no_such_index",
"took": 13,
"_shards": {
"total": 0, // while it is considered a "successful search" (no failures), the shard count of 0 indicates that no shards were searched
"successful": 0,
"skipped": 0,
"failed": 0
}
}
}
}
The above model is how it is handled in _search, so I plan to follow the same model for ES|QL unless Product or the ESQL team want to have this intentionally behave differently in ESQL.
Pinging @elastic/es-analytical-engine (Team:Analytics)
Description
Enhance query responses to include information about the shards and clusters against which the query was executed.
The goal of this issue is to provide parity between the metadata displayed for cross-cluster searches in
_search
and ES|QL.In particular, for a cross-cluster query:
Here is an example of the above metadata in a SearchResponse (using
_search
or_async_search
). Some of the fields or statuses in _search are not (yet?) supported in ES|QL so will be not used in ES|QL for the time being.The ES|QL response will be modified to look like:
Additional changes
End user docs around cross-cluster search will need to be updated.
Stakeholders
The Kibana team has asked for the cluster metadata in ES|QL to be as similar as possible to that in SearchResponses. They will need to review the proposal here.
Implementation / Detailed Design Proposal
We introduce an EsqlExecutionInfo object that is patterned (largely copied) from the SearchResponse.Clusters class in the
_search
codebase.It will track all of the information described in the previous section.
TransportEsqlQueryAction / field-caps / IndexResolver
The cluster map in EsqlExecutionInfo is first populated at this stage, since this is the first point in the ES|QL processing chain that the clusters involved in the query are resolved. For each cluster involved in the search (including the local, querying cluster), a Cluster object will be created holding the cluster alias, a status of RUNNING and skip_unavailable setting and put into the EsqlExecutionInfo that is created for this particular ES|QL search.
If an async ES|QL query is made and the async endpoint is polled while the search is still running on a given, the async search response will include the clusters with running status, like so:
Modifiications to ComputeListener
The ComputeListener and ComputeResponse is used by both local and remote clusters and for both the cross-cluster (minimize_round_trips=true) requests as well as within-cluster DataNodeRequests. In order to properly account for metadata coming back, it was necessary to provide separate methods within ComputeListener for these two pathways so that is clear what metadata to expect in the ComputeResponse.
Shard Accounting
Shard accounting is done after the Search Shards API is called. Here we get a count of the total number of shards in scope for the search and how many, if any, will be skipped (due to can_match). That information is then recorded for each cluster in the EsqlExecutionInfo.Cluster object.
This will provide accurate counts of the total number of shards and the number of skipped shards. We also fill in failed_shards=0 and successful_shards=total_shards since ES|QL does not currently return partial results when searches on some shards fail. If ES|QL does change later to support partial results in that way, then additional shard accounting will be added during/after the DataNodeRequest phase of ES|QL processing.