elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.91k stars 24.73k forks source link

[ILM] Additional parameters "refresh_interval" in ILM Policy #108869

Open Harmlos opened 5 months ago

Harmlos commented 5 months ago

Description

Currently, when modifying the lifecycle of indices, it is possible to set a priority for the index, which allows for controlling the order of index recovery.

Another important parameter of an index is the refresh_interval, which is set to a minimal value for indices subjected to intensive writing and to larger intervals for indices where data updates no longer occur.

For example, when ingesting a data stream from an Elastic Agent with 1000 computers per day, an index size of 100 GB can be created. By default, the refresh_interval is set to 5 seconds, allowing data to be viewed almost in real-time. However, when the index transitions to the cold phase, the index continues to refresh every 5 seconds. This does not provide access to new data but imposes a significant load on the disk subsystem due to data that was written over 30 days ago.

I request the consideration of adding the capability to set the refresh_interval for the warm and cold phases. In this case, the interval could be set to 60 seconds for the warm phase and 600 seconds for the cold phase. Currently, such operations have to be performed using external scripts.

elasticsearchmachine commented 5 months ago

Pinging @elastic/es-data-management (Team:Data Management)

dakrone commented 5 months ago

@Harmlos what version of Elasticsearch are you using? I ask because I believe there have been optimizations added to refreshing where shards that are not being actively indexed do not need to refresh. I would not expect any difference of impact between an index with a 5 second refresh interval and one with a 600 second interval, assuming that no documents are being written to the index.

Harmlos commented 4 months ago

I tested it on a cluster running version 8.12.2 with around 2000 shards on each server in cluster. I monitored the CPU load of the application's Docker image before and after the change, as well as the iowait parameter.

After manually changing the refresh_interval value for user and system indices, the CPU load decreased, and the number of disk read interrupts also decreased.