[Research] bsearch investigation

thomasneirynck commented 10 months ago

The kibana Server internal bsearch API is foundational to the functioning of dashboards.

Architecture

bsearch collects various Elasticsearch aggregation request in the browser on a debounce schedule
Kibana Browser issues these as a single request to kibana server.
Kibana server fans these out to Elasticsearch as individual requests to Elasticsearch.
Kibana Server then receives the responses. Kibana Servers serializes these into base64, concatenates them, and returns them as a single multi-line text file.
Kibana Browser then decodes the response.

e.g. rough schematic

Purpose

avoid browser connection limit of http1
Depending on the search-strategy, long running calls are issues as _async_search calls . This enables queries from the Dashboards to be run as background sessions.

Areas for improvement

bsearch resolves key constraints, but also introduces new ones. Primarily, it increased pressure on Kibana Server.

Batching inbsearch is primarily work-around for http1 limitations. http2 support of Kibana Server/Cloud infra would clear this hurdle (https://github.com/elastic/kibana/issues/7104)
the crufty response format puts pressure on both Kibana Server and Browser
- on the server, Kibana must wait on ES-reponses, serialize to base64, and concatenate each response
- on the client, Kibana must re-inflate. It does so in two steps. Decoding of the base64 strings, followed by unmarshaling in a JSON object.
- This two-step string encoding/decoding prevents more efficient streaming mechanisms. It also prevents relying on the built-in gzip compression of Elasticsearch. e.g. some kibana endpoints just stream data straight from Elasticsearch to the Browser (e.g. the maps/mvt endpoints).
it does not leverage optimal querying-strategies for Elasticsearch. More optimal querying strategies would consist of optimizing the queries into a single request, rather than fan-out into separate requests. (note that this requires bsearch to be aware of the semantics of the requests. ie. this would only really work with aggregations).

Goals

Consider:

Move to http2 and re-evaluate batching/re-encoding requirements of bsearch
Investigate whether bsearch can be "smarter" in its collection of queries. The vast majority of bsearch calls from Dashboards are aggregations and could be more efficiently run with a single msearch or search query that combines the aggs in a single definition.

elasticmachine commented 10 months ago

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

elasticmachine commented 10 months ago

Pinging @elastic/kibana-presentation (Team:Presentation)

Dosant commented 4 months ago

Linking this relevant investigation. Could be helpfull

thomasneirynck commented 4 months ago

Query-consolidation is an unexplored area, and could benefit from the existing architecture.

vadimkibana commented 3 months ago

A few notes, in case it might help:

bfetch batches client requests, I think, by default, up until 25 requests or until 10ms elapse, whichever is first.
The responses are not concatenated back into a text file and sent as one response, but they are streamed as soon as each becomes available as NDJSON (new line delimited JSON). Using the Content-Encoding: chunked HTTP header and ability to listen for new chunks in the browser using some less known XHR request APIs.
bfetch response stream can also compress each message, in which case it then encodes each line as Base64 text (instead of JSON).
- This compression is not good, as it does not use native HTTP compression, nor browser decompression mechanisms, instead, it is custom compressed in Node.js and encoded as Base64 and then there is a custom de-compression code bundled to the browser, which decodes Base64 and then decompresses.
~An important use case is the ability to stream back the response, it can be configure to stream back an infinite sequence of responses. Observability solution started using it as a WebSocket of sorts. Where they open a long-living connection which pushes from the server status information https://github.com/elastic/kibana/pull/138069~ (EDIT: see https://github.com/elastic/kibana/issues/166206#issuecomment-2023613358)
To make streaming of small messages work, we had to do the below patches. Before the Cloud Proxy used to buffer HTTP responses up to 4KB, which was changed to allow the Cloud Proxy to pass through any size message immediately.
- https://github.com/elastic/cloud/pull/106440
- https://github.com/elastic/kibana/pull/139534
There is a new steaming mechanism used in Kibana, called "response_stream". It used to be a plugin, now a package. It uses native HTTP compression on the server and native browser APIs for decompression.

thomasneirynck commented 3 months ago

thx @vadimkibana!

wrt synthetics use-case, @dominiqueclarke just informed this usage was removed in 8.10.

thomasneirynck commented 2 months ago

Consider turning bsearch off in just Serverless https://github.com/elastic/kibana/issues/181938

thomasneirynck commented 1 month ago

With https://github.com/elastic/kibana/pull/179663, we have been collecting more telemetry on the overhead of bsearch, specifically the custom encoding part into the line-delimited base64 format.

Metrics:

Long-tail distribution of time spent per single call.

75 percentile sits under 50-60 ms per bsearch call.

Given that a single dashboard will have typically 5-6 bsearch calls to fetch data for all charts, we can expect 10s to 100s of milliseconds spent on a single dashboard, just re-encoding the data per single time2data cycle.

Long-tail distribution of total message size

time spent scales linearly with message size

Message size and encoding time scales linearly (duh).

Evidence of really large responses.

At the end of the long tail (+95percentile), we find evidence of really large responses (in the order of megabytes)

Takeway

Removing time spent re-encoding data in bsearch should be a broad but shallow improvement to overall time. We should expect it to compound positively as well, given the single-threaded nature of nodejs. While we have no metrics on that, given the evidence of large data-responses, removal of this encoding should also reduce memory pressure on the kibana-server at runtime.

Overall, removal will help work towards "thinning" the kibana server footprint, and should yield measurable improvements to time2data (providing kibana-server supports http2 parallelization).

kertal commented 12 hours ago

qq: is the research part done? can this be closed?

elastic / kibana