elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.5k stars 24.89k forks source link

(HealthAPI) flags 0 doc indices for overdue ILM Rollover #116894

Open stefnestor opened 2 weeks ago

stefnestor commented 2 weeks ago

Elasticsearch Version

8.15.3

Installed Plugins

No response

Java Version

bundled

OS Version

ESS

Problem Description

👋 howdy, team! We've noticed that indices in phase/action/step: hot/rollover/check-rollover-ready eventually flag in the Health API if they always have docs.count: 0. We believe these indices flagging is unexpected but would like to confirm.

Steps to Reproduce

  1. create empty index with low ILM rollover

    PUT _ilm/policy/a
    { "policy": { "phases": {
    "hot": {
      "min_age": "0ms",
      "actions": {
        "rollover": { 
          "max_age": "1s"
        }}}
    }}}
    
    PUT _index_template/c
    { "index_patterns": ["b-*"],                   
    "template": {  "settings": {
        "index.lifecycle.name": "a"
        ,"index.lifecycle.rollover_alias": "b"
    }}}
    
    PUT b-000001
    {"aliases": { "irmao": { "is_write_index": true }}}
  2. wait an unknown period of time but then index will flag in health report
    GET _health_report?filter_path=indicators.ilm

Logs (if relevant)

None

elasticsearchmachine commented 2 weeks ago

Pinging @elastic/es-data-management (Team:Data Management)

dakrone commented 1 week ago

@stefnestor do you have the output of the indicator for this?

dakrone commented 1 week ago

I've tried reproducing this with the above code, and setting the time intervals to 1m (the defaults are 1d), but I haven't been able to get the health indicator to report it as unhealthy.

stefnestor commented 1 week ago

Thanks, @dakrone ! You may be encountering https://github.com/elastic/elasticsearch/issues/113553 while testing.

This has surfaced in a couple user issues, e.g. 01769083 (Oct19, will tag you in thread). I do not have an example output on hand sorry but can confirm the index only sat in hot/rollover/check-rollover-ready with incrementing retries while the Health reported stuck-not-error.