elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.7k stars 8.12k forks source link

Improving CSV generation by supporting concurrent background tasks #181064

Open mikecote opened 4 months ago

mikecote commented 4 months ago

The report:execute task today has a concurrency per Kibana node set to 1. What Task Manager does when a task type is configured like this is it will prevent more than one reporting task from running on the same node at any given time.

The following exposes some limitations that we have in serverless:

What I propose is running the CSV generation tasks under a new task type report:execute-csv that doesn't have maxConcurrency set within its task definition and keep the report:execute multi-purpose in case there are still CSV tasks in the queue. This will allow 10x throughput per Kibana node for generating CSVs and will benefit serverless, ESS and on-prem users. One thing to keep an eye out for is with 10x concurrency, we also put 10x the memory / CPU pressure and I am not familiar with the internals of how much resource utilization each task needs.

elasticmachine commented 4 months ago

Pinging @elastic/appex-sharedux (Team:SharedUX)

kobelb commented 4 months ago

Assuming my understanding of https://github.com/elastic/kibana/pull/108485 is accurate, and still relevant, running 10 CSV exports concurrently has the chance of causing Kibana to crash due to an OOM. @elastic/appex-sharedux can you all confirm that each CSV export task could use approximately 100MB of memory?

tsullivan commented 4 months ago

@kobelb You understanding seems accurate of the current configuration of how we chunk the reports. Currently, @vadimkibana has created an issue to hardcode the chunk size to 4MB: https://github.com/elastic/kibana/issues/180829 and that will stop reports from causing OOM in 1GB instances.

kobelb commented 4 months ago

Thanks, @tsullivan. If we use 4 MB chunks, then I don't have concerns about doing 10 concurrently.

tsullivan commented 4 months ago

Depends on