elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.16k stars 24.84k forks source link

Find right combination of parameters for ZSTD `best_speed` #108863

Open salvatore-campagna opened 5 months ago

salvatore-campagna commented 5 months ago

Description

As a result of the investigation activity we conducted after introducing ZSTD, we agreed that we need to find a better set of parameters for the best_compression codec. This is the codec we are going to use in LogsDB. We need to try out less aggressive settings when it comes to storage footprint reduction. The goal is to find the "sweet spot" that allows us to grab the benefits in terms of storage without sacrificing anything in terms of latency so to avoid regression in query latency, dashboard loading and so on.

This means we will need to try a different set of parameters, starting with decreasing the block size, going to, maybe, 128k or 64k. Ideally we would like to keep the compression level as it is but we might need to change it too. Keep in mind anyway, that the choice of these parameters affects both ZSTD CPU and memory usage which we need to measure to avoid finding a "sweet spot" that is too hungry in terms of memory and/or CPU usage. We would like to avoid a "sweet spot" that is good in terms of query latency and storage footprint but that would impact our (hardware) resource usage, with consequences on costs.

When it comes to CPU usage, indexing throughput and search latency are good ways to evaluate where we stand in terms of CPU usage, but we need to track also memory usage, something that we are missing at the moment in our Rally benchmarks. Maybe we could track memory usage by attaching a profiler.

elasticsearchmachine commented 5 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

martijnvg commented 1 month ago

The feature flag has been removed zstd when index.codex=best_compressions: #112665

Lowering blockDocCount for default / best speed from 128 to 96 (#112098) hasn't resulted in an improvement in get by id performance in our get by id visualizations: https://elasticsearch-benchmarks.elastic.co/#tracks/tsdb/nightly/default/30d (all visualizations under nightly-tsdb-indexing-throughput-revisions table.