Open salvatore-campagna opened 1 year ago
Pinging @elastic/es-analytics-geo (Team:Analytics)
The approach we would like to adopt is to have a thread pool of fixed size, let's say n. This means we will have n threads running concurrently. Each thread will process only documents whose downsampling thread id is equal to hash(tsid) % n
.
For instance, if we have a thread pool with 4 threads, we will have each thread processing documents with thread id equal to 0, 1, 2 or 3. This way we will have 4 concurrent threads running.
Pinging @elastic/es-storage-engine (Team:StorageEngine)
Description
Right now downsampling runs as a single threaded task. This means that downsampling large indices might take considerable time and we need a solution that allows us to increase the downsampling throughput reducing the overall downsampling operation latency. One of the options is to make a single downsampling operation happening at shard level multi-threaded.