Open danielmitterdorfer opened 9 months ago
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)
EDIT: for more up2date feedback, see https://github.com/elastic/kibana/issues/176208#issuecomment-2035689024
this is some initial feedback, purely based on a code audit and some offline discussion with @danielmitterdorfer
The flame-graph server side code in kibana uses the ES-client.
It does not specify the asStream
option:
I am pretty sure, that when this is not set to true
, the client will unpack the result.
When setting it to true
, you can stream it back through the kibana server.
To do this, you must ensure the content-*
headers match the output from Elasticsearch.
To see an example from the current Kibana code-base, please check tile-routes:
asStream
= true(optionally, you can gzip here)
note: tileStream
is the raw ES-response body
So, again, just from code-audit, I think we should make two changes:
asStream
in the ES-client hereresp.body
, but match the content-* headers
(this would be here (????))
Following up on earlier comment.
The ES-client parses the response and re-encodes it. See here:
https://github.com/elastic/elastic-transport-js/blob/main/src/Transport.ts#L525 https://github.com/elastic/elastic-transport-js/blob/main/src/Serializer.ts#L62
asStream
The ideal solution would be to use the asStream
option to stream the result back.
Add an asStream
-option here: https://github.com/elastic/kibana/blob/72a377d5b2927d75537838b7077e7bd1e340a20f/x-pack/plugins/observability_solution/profiling_data_access/server/utils/create_profiling_es_client.ts#L146-L149
In order for this to work, the ES-response handling needs to change (see below)
totalSeconds
really needed?This adds a new property. totalSeconds
, wrapping it in a new response object. This would prevent streaming back the result.
However, it is only used for tooltips, and it seems this value is optional (?)
Ensure content-*
encodings are transferred from the ES-response headers
.
The client now expects a simple JSON.
This will need to be adjusted to handle a stream.
^ this is a broad investigation. I may have missed some intricacies, so it would be good to get some additional feedback from Obs-team on this as well.
Thanks for the investigation @thomasneirynck and @danielmitterdorfer!
Ensure content-* encodings are transferred from the ES-response headers.
The ES response content is gzip compressed while the Kibana response is br
(brotli) compressed. We did this to reduce the content size by ~50% and reducing the transfer time to the Kibana client side. See https://github.com/elastic/kibana/pull/142334
(From what I remember, in my hotel room with a slow network I had to wait ~50s for a gzip-compressed flamegraph and only ~25s when brotli-compressed. But fya, the compression rate also depends on the payload and the response content format has changed since then.)
Another caveat with simply transferring the headers is that you can not assume that the browser (or client) negotiate exactly the same content-* fields as Kibana server and ES negotiate. Sure enough, most browsers are lenient enough to accept a gzip content-encoding even when brotli has been negotiated. But "most" is not "all" - and we possibly don't won't to run into support issues here.
So I would vote for re-compression in case that content-encoding isn't the same on both sides. This is a simple decompression, compression without parsing and should be fast enough.
A possible future follow-up could be to allow brotli and/or zstd on the ES side (zstd has been included in ES recently, so that's just a matter of time/effort until we can use it).
is totalSeconds really needed?
The totalSeconds value is derived from user settings (UI) only. It is timeTo-timeFrom
, both request params. So this can be calculated on the client side as well (alternatively, it can be added to the flamegraph response if that is a better option).
Hi all,
Thanks for raising this. Is anyone able to summarise the kind of speeds we're seeing and the benefit we could gain?
Trying to get a sense of impact on customers
Thanks for raising this. Is anyone able to summarise the kind of speeds we're seeing and the benefit we could gain?
Hey @roshan-elastic! Sure, Kibana adds around 2 seconds of overhead. So in the scenario that I've tested we would reduce the response time from roughly 6 to roughly 4 seconds.
Thanks @danielmitterdorfer
FYI I've added this as a top-20 issue in the backlog
Kibana version: 8.12.0
Elasticsearch version:8.12.0
Server OS version: Linux (ESS)
Browser version: N/A (issue is browser-independent)
Browser OS version: N/A (issue is browser-independent)
Original install method (e.g. download page, yum, from source, etc.): ESS
Describe the feature:
When we render a flamegraph in Universal Profiling, an Elasticsearch API sends the response already in a format that is suitable for rendering directly in the browser. Looking at APM we can see that we spend some time still in Kibana server when it basically "only" needs to pass the response from Elasticsearch to the browser as is. In the example APM trace below that time is around 2 seconds (the white gap after
POST /_profiling/flamegraph
ends):If we profile what Kibana server is doing, it seems that it is deserializing the response from Elasticsearch and then serializes it again. However, I believe we can eliminate that step because the response should already be in the exact format that the browser expects (if not I propose we adapt the ES API). Below is an annotated flamegraph that shows where Kibana server spends time serializing and deserializing the response from Elasticsearch: