Open mshustov opened 2 years ago
Pinging @elastic/kibana-core (Team:Core)
The v7 client has two compression options:
suggestCompression
: if true, asks ES for compressed data via accept-encoding header.compression
: if true, sends compressed data to ESThe v8 client has a single compression
option, which does both.
Furthermore, if you enable compression and are using maxResponseSize
, remember to configure maxCompressedResponseSize
as well.
The client defaults to false unless you are using Elastic Cloud (detected via the cloud id option), in such case, it will enable compression by default as it's recommended with Elastic Cloud.
The client defaults to false unless you are using Elastic Cloud (detected via the cloud id option), in such case, it will enable compression by default as it's recommended with Elastic Cloud.
Ok, we have to set compression: true
explicitly then. Since Kibana doesn't configure the cloud
option.
We should conduct load testing for Cloud and on-prem instances to decide whether Kibana will set compression: true by default or we expose it as elasticsearch.compression configuration setting.
We could start by exposing this new elasticsearch.compression
config property while preserving the current default value of false
, and then, depending on the results of the perf/load testing, decide if we want to switch the default to true
later. It would allow customers really wanting/needing this feature to use it asap.
I agree with this approach. One of our customer with 1000 users in kibana, 3000 dashboards and 4000 queries/minute has tested it and has seen the network traffic dropping by 90% going from 500GB/hour to 50GB/hour.
Of course not all the query will benefit (small queries with a very small result set can be slower) and the CPU usage will increase a bit. But let the users decide.
I opened https://github.com/elastic/kibana/pull/124009 to introduce a new elasticsearch.compression
configuration property that will default to false
(which is the effective current value) for now.
https://github.com/elastic/kibana/pull/124009 was merged, I updated the issue accordingly and added a subtasks list.
I performed some testing around the performance and bandwidth impact of enabling compression for the Kibana<->ES communications.
I couldn't find any proper way to monitor the bandwidth usage when running the load tests, so I felt back to using a homebrew proxy between Kibana and ES to monitor the request and response size of some queries while manually navigating within Kibana.
No big surprise here, the gain is what you would expect from enabling compression on any HTTP-based communication. Depending on the size of the request and response, there is a 20% to 90% compression.
I won't list everything here, but for example the /_search
response associated with loading the sample data's dashboard from the dashboard listing page get compressed around 85%, and that's only returning 3 dashboard documents.
// without compression
request length: 263
response length: 66893
// with compression
request length: 181
response length: 10884
Overall, we should be expecting an overall reduction in bandwidth usage when enabling compression of at least 60%, and probably more like 80%, depending on the specific usages of the instance/customer, which is significant.
I couldn't perform load testing against the whole stack on Cloud, given the elasticsearch.compression
is not allow-listed, so I performed two suites of tests:
(all suites were ran 3 times, with very similar results so I'm only showing one of the 3 results for each)
This was surprising. Given that compression/decompression is done outside of the main event loop in node, I was expecting a negligible impact on performance. Tests show that this is far from negligible when Kibana/ES are under heavy load in that scenario.
Overall mean is +50% when compression is enabled (900 vs 600). When drilling down, it vary between +15% and +100% depending on the requests. Supposedly, the biggest the response, the highest the difference in mean value.
Min value is not affected much for any endpoint
Overall 50th pct and 75th pct are doubled (!) . When drilling down, the results vary a lot, from no difference to twice the response time.
95th and 99th pcts are less impacted than the previous ones, with approx 25% difference between compression and no compression.
The results are way more acceptable than during local testing, which tend to indicate that either the bottleneck during local testing was more on the ES side than on the Kibana side, or that adding real latency for the communications (that can't be reproduced when hitting the local loopback in a local-to-local scenario) reduces the compression/decompression overhead significantly.
Min value is not affected much for any endpoint
The 50th pct is 25% higher with compression enabled, and mostly consistent for each endpoint
75th, 95th and 99th are almost the same with or without compression
elasticsearch.compression
to true
for all Cloud instances?Good question, and not sure who should be answering it. We should probably reach out to Cloud to decide whether the bandwidth reduction is worth the performance impact?
@stacey-gammon @lukeelmers wdyt?
elasticsearch.compression
on CloudIf the answer to the previous question is 'no', then ihmo yes. We confirmed that it works correctly and that the performance impact is not significant enough to restrict the usage of this option.
This was surprising. Given that compression/decompression is done outside of the main event loop in node, I was expecting a negligible impact on performance. Tests show that this is far from negligible when Kibana/ES are under heavy load in that scenario.
Maybe you faced this problem with many parallel requests initiated by kibana-load-testing
? https://nodejs.org/docs/latest-v16.x/api/zlib.html#threadpool-usage-and-performance-considerations
Have you tried to reduce the number of parallel connections?
Should we set elasticsearch.compression to true for all Cloud instances?
Maybe we can spend some additional time testing different compression settings? I'm wondering if changing setting compression level to zlib.constants.Z_BEST_SPEED
can help us a lot. from the nodejs docs https://nodejs.org/docs/latest-v16.x/api/zlib.html#compressor-options
The speed of zlib compression is affected most dramatically by the level setting. A higher level will result in better compression, but will take longer to complete. A lower level will result in less compression, but will be much faster.
@delvedor Does the ES client always uses compression if specified? Does it make sense to add a threshold? Overhead on compression of small objects can be higher than the benefits from sending a small chunk.
Maybe you faced this problem with many parallel requests initiated by kibana-load-testing
Fairly possible. I did reduce the number of concurrent users to 100 on local testing though (as 200 users was just too much for my machine apparently, either with or without compression I was getting lot of errors once the ramp up was over)
Maybe we can spend some additional time testing different compression settings? I'm wondering if changing setting compression level to zlib.constants.Z_BEST_SPEED can help us a lot
Worth a try, but as you already mentioned, I don't think we have control over the compression configuration the elasticsearch client is using (at least atm)? @delvedor could you confirm that?
Maybe we can spend some additional time testing different compression settings?
Also after thinking a bit more about it, this should only impact the compression performance of the requests toward ES, not the decompression of the responses (as we don't have control over the compression settings of the ES server), and given that responses are significantly higher in length than requests, I'm not sure this will impact the benchmark that much.
Also after thinking a bit more about it, this should only impact the compression performance of the requests toward ES, not the decompression of the responses (as we don't have control over the compression settings of the ES server), and given that responses are significantly higher in length than requests, I'm not sure this will impact the benchmark that much.
I didn't benchmark this, but it seems that decompression is way faster than compression. From http://facebook.github.io/zstd/
I think we should start with allowing customers to turn this setting on, so we can test it first on internal clusters. Then we should have a way to compare the performance and the data transfer rates to give us confidence that turning it on for all clusters won't cause issues.
Config option was added in https://github.com/elastic/kibana/pull/124009 for v8.1.0+
@stacey-gammon should I open a PR to add the config option to cloud's allowlist?
yes, that'd be great!
Unassigning as no longer actively working on this
Kibana relies on the elasticsearch-js client defaults with
compression: false
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/basic-config.htmlWhile enabling compression adds some runtime performance overhead, it might drastically reduce transmission time and increase network bandwidth. We should conduct load testing for Cloud and on-prem instances to decide whether Kibana will set
compression: true
by default or we expose it aselasticsearch.compression
configuration setting.A side note: it might be interesting to calculate how much it affects the Data transfer reductions initiative on Cloud.
Subtasks
compression
for the ES client - https://github.com/elastic/kibana/pull/124009