elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.76k stars 8.16k forks source link

Improve ILM error reporting in Index Management UI #126109

Open cjcenizal opened 2 years ago

cjcenizal commented 2 years ago

This issue resulted from a conversation with @imkarrer.

Problem

Long feedback loop

When there's an error with a lifecycle policy, the only way to you're notified of these errors is when you open Index Management and see an error callout. The only way to review the indices which are encountering ILM errors is to filter on ilm.step:ERROR and then click each index to see the information about the error.

How can we expose this information more immediately? The criteria for a good solution are:

Unreliable feedback loop

The Explain Lifecycle API doesn't preserve the last known error state while it's re-attempting to apply a lifecycle change. This means that as lifecycles run, the Index Management table will intermittently show a number of indices with ILM errors, and then 0 indices with errors, and then a number of indices with errors again. This creates a literal moving target for an administrator attempting to fix these errors. The ideal workflow consists of: seeing a list of all problems, trying a fix, and seeing the table update in response.

One solution could consist of updating the Explain Lifecycle API to preserve the last known error state, and to surface that in the table instead. This could result in each index having two types of ILM state: Error state (error and no error) and Running state (running and not running).

elasticmachine commented 2 years ago

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

elasticmachine commented 2 weeks ago

Pinging @elastic/kibana-management (Team:Kibana Management)