elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.42k stars 24.57k forks source link

Find right combination of parameters for ZSTD `best_speed` #108863

Open salvatore-campagna opened 3 months ago

salvatore-campagna commented 3 months ago

Description

As a result of the investigation activity we conducted after introducing ZSTD, we agreed that we need to find a better set of parameters for the best_compression codec. This is the codec we are going to use in LogsDB. We need to try out less aggressive settings when it comes to storage footprint reduction. The goal is to find the "sweet spot" that allows us to grab the benefits in terms of storage without sacrificing anything in terms of latency so to avoid regression in query latency, dashboard loading and so on.

This means we will need to try a different set of parameters, starting with decreasing the block size, going to, maybe, 128k or 64k. Ideally we would like to keep the compression level as it is but we might need to change it too. Keep in mind anyway, that the choice of these parameters affects both ZSTD CPU and memory usage which we need to measure to avoid finding a "sweet spot" that is too hungry in terms of memory and/or CPU usage. We would like to avoid a "sweet spot" that is good in terms of query latency and storage footprint but that would impact our (hardware) resource usage, with consequences on costs.

When it comes to CPU usage, indexing throughput and search latency are good ways to evaluate where we stand in terms of CPU usage, but we need to track also memory usage, something that we are missing at the moment in our Rally benchmarks. Maybe we could track memory usage by attaching a profiler.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)