elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.67k stars 8.23k forks source link

[Observability][Rules] Elasticsearch Query (ES|QL) Rule Support Group By after query execution #197902

Open BenB196 opened 4 weeks ago

BenB196 commented 4 weeks ago

Describe the feature:

I want to be able to group by (split) hits/alerts/context into individual actions based on specific fields, similar to Custom Threshold rule Group by.

Describe a specific use case for the feature:

Suppose I have the following ES|QL query (extremely simple for demo):

FROM metrics-system.cpu-*
| KEEP host.name, system.cpu.total.pct
| WHERE system.cpu.total.pct > 80 BY host.name
| STATS max_cpu_pct = MAX(system.cpu.total.pct), min_cpu_pct = MIN(system.cpu.total.pct)
| EVAL cpu_pct_diff = max_cpu_pct - min_cpu_pct
| KEEP host.name, cpu_pct_diff

Currently, when using this query in the Elasticsearch Query rule, it will only generate one alert and one action (because all hits are grouped under the same context)

I would be helpful to be able to tell the rule to "split" (group by) fields (host.name) to generate an alert/context per hit.


Here is a slightly more realistic (and complex) example:

// FROM
// We use metrics-prometheus.collector-* as this is where Kafka metrics are stored
FROM metrics-prometheus.collector-*
// KEEP
// We only keep the fields we actually use
//  - prometheus.labels.kafka_topic - used for filtering and group by
//  - prometheus.kafka_records_produced_total.counter - used for math
//  - data_stream.namespace - used for group by
//  - service.address - used for group by
| KEEP prometheus.labels.kafka_topic, prometheus.kafka_records_produced_total.counter, data_stream.namespace, service.address
// WHERE
// 1. Filter for docs that have prometheus.labels.kafka_topic values that start with DLQ
// 2. Filter for docs that have prometheus.kafka_records_produced_total.counter greater than 0
//   - We need to cast to a double because the field is a counter type
//   - We want greater than 0, as there are docs that can have 0 value that will mess up math/rule
| WHERE STARTS_WITH(prometheus.labels.kafka_topic, "dlq")
    AND TO_DOUBLE(prometheus.kafka_records_produced_total.counter) > 0
// STATS
// 1. topic_max - Compute the max value of prometheus.kafka_records_produced_total.counter within the time range
// 2. topic_min - Compute the min value of prometheus.kafka_records_produced_total.counter within the time range
// 3. Group by rometheus.labels.kafka_topic, data_stream.namespace, service.address
| STATS topic_max = MAX(TO_DOUBLE(prometheus.kafka_records_produced_total.counter)),
    topic_min = MIN(TO_DOUBLE(prometheus.kafka_records_produced_total.counter))
    BY prometheus.labels.kafka_topic, data_stream.namespace, service.address
// EVAL
// Compute the diff of the topic values from max and min
| EVAL topic_diff = topic_max - topic_min
// WHERE
// Filter for the alerting conditions we want
| WHERE topic_diff > 0 AND
    (
        (prometheus.labels.kafka_topic == "topic_a" AND topic_diff > 30)
        OR (prometheus.labels.kafka_topic == "topic_b" AND topic_diff > 10)
        OR (prometheus.labels.kafka_topic == "topic_c" AND topic_diff > 10)
        OR (prometheus.labels.kafka_topic == "topic_d" AND topic_diff > 30)
        OR (prometheus.labels.kafka_topic == "topic_e" AND topic_diff > 10)
        OR (prometheus.labels.kafka_topic == "topic_f" AND topic_diff > 50)
        OR (prometheus.labels.kafka_topic == "topic_g" AND topic_diff > 1)
        OR (prometheus.labels.kafka_topic == "topic_h" AND topic_diff > 6)
        OR (prometheus.labels.kafka_topic == "topic_i" AND topic_diff > 6)
    )
// KEEP
// We only need these fields for our rule/alert.
| KEEP topic_diff, prometheus.labels.kafka_topic, data_stream.namespace, service.address

I want to be able to group by the fields; prometheus.labels.kafka_topic, data_stream.namespace, service.address, and have an alert & action triggered on that group by matrix.

elasticmachine commented 3 weeks ago

Pinging @elastic/unified-observability (Team:Observability)