elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.12k forks source link

Better Stack Monitoring - Disk usage over data tiers #190042

Open lduvnjak opened 1 month ago

lduvnjak commented 1 month ago

Describe the feature:

Describe a specific use case for the feature:

Being able to monitor disk usage per data tier: It would be nice if you could be able to monitor disk usage for specific data tiers (hot, warm, cold) on large clusters with hunders of nodes. Currently the Stack Monitoring page only displays cluster wide metrics for disk usage, or for a specific node. This is generally enough for most clusters. But for clusters with hunders of nodes and petabytes of storage it becomes rather difficult to tune the ILMs so the different data tiers are used to it's maximum.

image

For example - There would also be a window that would show the disk and JVM available for hot, warm, and cold tiers respectively.

Being able to monitor a specific index or data stream across data tiers: I feel like this feature is needed a lot more. Specfically, it's about being able to monitor an index set (alias) or a data stream not only by it's overall disk usage, but also by the amount it takes up on each data tier.

Let's say we have a massive dataset (netflow, firewall logs, etc.) which generate multiple terabytes of data daily. Being able to see how much said data stream takes up on the hot, warm, and cold data tiers respectively would be an amazing tool for fine tuning the ILMs and retention.

image

For example - There would also be information about the storage size for hot, warm, and cold tiers respectively.

lduvnjak commented 1 month ago

I'd like to pop back into here with a solution I came up with regarding the data usage over tier:

PUT _transform/metrics-elasticsearch.stack_monitoring.node_stats-transform
{
  "source": {
    "index": [
      "metrics-*"
    ],
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "term": {
                          "data_stream.dataset": {
                            "value": "elasticsearch.stack_monitoring.node_stats"
                          }
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                },
                {
                  "bool": {
                    "must_not": {
                      "bool": {
                        "should": [
                          {
                            "exists": {
                              "field": "error.message"
                            }
                          }
                        ],
                        "minimum_should_match": 1
                      }
                    }
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "latest": {
    "unique_key": [
      "elasticsearch.node.name"
    ],
    "sort": "@timestamp"
  },
  "dest": {
    "index": "INDEX_NAME_HERE"
  },
  "sync": {
    "time": {
      "field": "@timestamp"
    }
  },
  "retention_policy": {
    "time": {
      "field": "@timestamp",
      "max_age": "7d"
    }
  }
}

The destination index will have a document of the latest metrics data for each node. You can then create visualizations like these by querying for a specific role in elasticsearch.node.roles:

image

Keep in mind the only FS info you have is total available and total. This means you will have to "negate" them in visualizations like percentage calculation.

image

!! Make sure to copy the mappings from the latest index under the metrics-elasticsearch.stack_monitoring.node_stats-* and bootstrap your transform destination index with the same ones, so the underlying data is correctly mapped.