Better Stack Monitoring - Disk usage over data tiers

elastic / kibana

Your window into the Elastic Stack

Other

19.7k stars 8.24k forks source link

Describe the feature:

Being able to monitor disk usage per data tier.
Being able to monitor a specific index or data stream across data tiers.

Describe a specific use case for the feature:

Being able to monitor disk usage per data tier: It would be nice if you could be able to monitor disk usage for specific data tiers (hot, warm, cold) on large clusters with hunders of nodes. Currently the Stack Monitoring page only displays cluster wide metrics for disk usage, or for a specific node. This is generally enough for most clusters. But for clusters with hunders of nodes and petabytes of storage it becomes rather difficult to tune the ILMs so the different data tiers are used to it's maximum.

For example - There would also be a window that would show the disk and JVM available for hot, warm, and cold tiers respectively.

Being able to monitor a specific index or data stream across data tiers: I feel like this feature is needed a lot more. Specfically, it's about being able to monitor an index set (alias) or a data stream not only by it's overall disk usage, but also by the amount it takes up on each data tier.

Let's say we have a massive dataset (netflow, firewall logs, etc.) which generate multiple terabytes of data daily. Being able to see how much said data stream takes up on the hot, warm, and cold data tiers respectively would be an amazing tool for fine tuning the ILMs and retention.

For example - There would also be information about the storage size for hot, warm, and cold tiers respectively.

I'd like to pop back into here with a solution I came up with regarding the data usage over tier:

PUT _transform/metrics-elasticsearch.stack_monitoring.node_stats-transform
{
  "source": {
    "index": [
      "metrics-*"
    ],
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "term": {
                          "data_stream.dataset": {
                            "value": "elasticsearch.stack_monitoring.node_stats"
                          }
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                },
                {
                  "bool": {
                    "must_not": {
                      "bool": {
                        "should": [
                          {
                            "exists": {
                              "field": "error.message"
                            }
                          }
                        ],
                        "minimum_should_match": 1
                      }
                    }
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "latest": {
    "unique_key": [
      "elasticsearch.node.name"
    ],
    "sort": "@timestamp"
  },
  "dest": {
    "index": "INDEX_NAME_HERE"
  },
  "sync": {
    "time": {
      "field": "@timestamp"
    }
  },
  "retention_policy": {
    "time": {
      "field": "@timestamp",
      "max_age": "7d"
    }
  }
}

The destination index will have a document of the latest metrics data for each node. You can then create visualizations like these by querying for a specific role in elasticsearch.node.roles:

Keep in mind the only FS info you have is total available and total. This means you will have to "negate" them in visualizations like percentage calculation.

!! Make sure to copy the mappings from the latest index under the metrics-elasticsearch.stack_monitoring.node_stats-* and bootstrap your transform destination index with the same ones, so the underlying data is correctly mapped.

elastic / kibana

Better Stack Monitoring - Disk usage over data tiers #190042