elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
704 stars 24.79k forks source link

Multi threaded shard level downsampling #96757

Open salvatore-campagna opened 1 year ago

salvatore-campagna commented 1 year ago

Description

Right now downsampling runs as a single threaded task. This means that downsampling large indices might take considerable time and we need a solution that allows us to increase the downsampling throughput reducing the overall downsampling operation latency. One of the options is to make a single downsampling operation happening at shard level multi-threaded.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

salvatore-campagna commented 1 year ago

The approach we would like to adopt is to have a thread pool of fixed size, let's say n. This means we will have n threads running concurrently. Each thread will process only documents whose downsampling thread id is equal to hash(tsid) % n.

For instance, if we have a thread pool with 4 threads, we will have each thread processing documents with thread id equal to 0, 1, 2 or 3. This way we will have 4 concurrent threads running.

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)