elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.98k stars 24.75k forks source link

ILM index rollover and delete based on index size #93289

Open jackwang713 opened 1 year ago

jackwang713 commented 1 year ago

Description

I have encountered this problem and want a kind of datastream to keep only 100GB historical data instead of keeping it by time, for example, only 60 days. As far as I know the way ILM currently goes to the next stage is min age, is it possible to add a method for the number of back-up indexes.

Example. Rolling index size is 50GB, set ILM policy to keep 2 fallback indexes, and delete the third fallback index when it is found, so that this datastream can keep 100GB historical data.

albertzaharovits commented 1 year ago

Hi @jackwang713 ,

Let me see if I got the ask right. Please do correct me if I got it wrong.

I understand that you want to keep the latest n indices in a rollover series (and remove every n+1 as rollovers happen). Given that it's possible to configure the rollover action for a specific index size m, keeping the last n indices should permit keeping around roughly n * m amount of data.

This sounds like a reasonable request to me, but I reckon this is a bit tricky to model because the phase transitions all rely on age rather than size. I have also found a discuss issue on the same topic: https://discuss.elastic.co/t/index-rollover-delete-indexes/300685

I'm going to triage it for futher opinions.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-data-management (Team:Data Management)

jackwang713 commented 1 year ago

Elasticsearch collects log data and uses ILM to avoid the rapid growth of data volume leading to insufficient disk space, but an appropriate ILM policy requires manual involvement and daily operation and maintenance: discovering data growth patterns to configure an appropriate ILM policy. Can we reduce manual O&M and keep ES stable in the long run? Keeping a certain number of rolling copies is only a disguised solution. DVR device data management is simple enough: the oldest video files can be automatically deleted when hard disk space is low, automatically freeing up disk space. Similarly, will ES be able to automatically delete the oldest rolling index when it checks for "disk.watermark.low" in the future?