elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.78k stars 8.19k forks source link

How to send recovered alerts according to the original alert state #160984

Open maryam-saeidi opened 1 year ago

maryam-saeidi commented 1 year ago

đź“ť Summary

In metric threshold and new threshold rules, we have two/three types of actions that can be generated:

As shown below, we have settings to control no data behavior for the metric threshold rule, but we are now removing this from the new threshold rule. image

âť“Question

Suppose that we have different actions for alert/warning/no data. In that case, how can we also send the recovered messages to the same group as we sent the alert?

Different action states An example scenario
image image

Previously, we had a field called originalAlertState in action context with the following logic:

const translateActionGroupToAlertState = (
  actionGroupId: string | undefined
): string | undefined => {
  if (actionGroupId === FIRED_ACTIONS.id) {
    return stateToAlertMessage[AlertStates.ALERT];
  }
  if (actionGroupId === NO_DATA_ACTIONS.id) {
    return stateToAlertMessage[AlertStates.NO_DATA];
  }
};

We don't want to save this information in AAD for the new rule, but we were wondering how this case can be covered when conditional actions are introduced.

Use-cases

  1. Separate warning recovered message
  2. Separate Recovered action conditions for Warning and Alert (145418) -> The issue was previously solved by adding an action context variable, check the related PR for more info.
elasticmachine commented 1 year ago

Pinging @elastic/actionable-observability (Team: Actionable Observability)

elasticmachine commented 1 year ago

Pinging @elastic/response-ops (Team:ResponseOps)

maryam-saeidi commented 1 year ago

There were three topics that we discussed:

  1. How to send a recovered alert to the same action group ---> warning/critical (Metric threshold) or low/medium/high/critical (SLO) Based on @mikecote 's input, this topic is getting more complicated as the warning alert can turn into a critical alert, then the question is, do we want to send multiple recovered notifications or just one when it recovered? @shanisagiv1 will check this case to see how we want to handle it in conditional action (Here is the use-case)
  2. How to handle no data? @kobelb suggested changing the rule state to a warning instead of firing an alert. @XavierM thinks it is not an alert and it should be handled only by a notification instead of alert Based on @simianhacker's input, users should be notified when there is no data for the alert, and with the current features, we can only set a no data alert. I think we need to handle this case for all the rules (no strong opinion on whether to use alert or notification) Decision: I will wait for @shanisagiv1 's input about when we might have a way to notify the user if a rule is in error/warning state and then AO will decide how to handle no data in the new metric threshold rule.
  3. Missing groups no data It's on @katrin-freihofner's radar, she proposed to have a new rule for this use-case, and it is still under refinement.

Please let me know if something is captured wrong or if I am missing something.