[Alerting] Add a required, programmatic message to actions

Summary

Alert executors should be able to send whatever message they want when firing an action. The user-defined message should be appended to the executor's programmatic message, and the user can use this to provide additional context. This is because the information that we need to convey in an alert is often complex, dynamic, and requires product design in order to be effective.

Context

From discussions on implementing https://github.com/elastic/kibana/issues/64080, the Metrics team has realized we need to be able to have more control over what messages get sent to users. Right now the message field relies entirely on the user to configure a useful message with all relevant information, and not to delete anything that's required.

This becomes especially precarious in a case like the Logs alerts (https://github.com/elastic/kibana/pull/62806/), which have a default message of:

{{context.matchingDocuments}} log entries have matched the following conditions: {{context.conditions}}

which becomes something like:

24 log entries have matched the following conditions: message matches ASL Sender Statistics

context.conditions is a highly dynamic value, and deleting it would make the alert message effectively useless.

Because of the complexity of potential alert states, conditions, and configurations, we're exploring using something even more dynamic than context.conditions in metric alerts. Perhaps removing all context variables and just writing a single context.message that formats all relevant information:

The alternative would quickly get too advanced and out of hand:

(Note the condition0 naming convention, which we already use in the 7.7 release. Users have to manually add references to condition1, condition2, etc. every time they add additional conditions, and that's aggravating and error-prone. And you may notice I already made a syntax error in my pseudocode)

We can implement the context.message approach with the alerting plugin today. The problem is, what happens if the user deletes context.message from their alert?

We don't want to rely on the user just realizing that they shouldn't do that.

Under this change, the user-defined message would no longer be to manually format and present the data coming from the alert. It would be to provide additional context relevant to whatever the user is using alerting for: e.g. instructions for the on-call person who's getting this alert about how to respond to it.

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Trying to boil down the requirements here - seems like there's a desire for two messages - one coming from the alert, which may be non-trivial (contain lists of things) - and one that could be set in the action params when editing the alert, specific to the usage of that alert. The customer would see both - presumably the one from the alert, followed by the one set in the action params - in an email/slack message, separated by a blank line.

I've been kind of thinking about something like this in reference to figuring out how to have an notification that would include the result of another action. Eg, a theoretic GitHub issue action that would create an issue. You'd like to get the issue number / url from that action, and add it as another part of the message. Maybe at the bottom?

At some point we need to look into better Slack messaging, which means using their "blocks" stuff. Perhaps we can settle on a generic shape that looks similar, and for messaging systems that don't have "blocks" like this, we just do the best we can - eg, join the blocks with a blank line between them.

The other thing to think about, as these messages get more complex, is the formatting supported by the various actions. Today we have plain text for most services, but Slack messages can use THEIR version of markdown-like markup, and for email we expect the message to be a more typical version of markdown - and their are differences. How should an alert render a message so that it can be consumed by either? Should it create a "slack" version of a message, and a "markdown" version? Is markdown good enough to use in plain text situations as well? A simple hack is to export allow context variables like message_slack and message_markdown, and then let the action executor figure out which of the message* variables to use. Or expose all of them, let the customer decide.

seems like there's a desire for two messages - one coming from the alert, which may be non-trivial (contain lists of things) - and one that could be set in the action params when editing the alert, specific to the usage of that alert. The customer would see both - presumably the one from the alert, followed by the one set in the action params - in an email/slack message, separated by a blank line.

Yep, that's about what I was thinking.

As for action types that are more complex than plain text, I feel like that makes having an opinionated message from the alert even more important. Slack blocks, especially, feel like they could benefit from specific product design choices. For Metrics, just basing off what Datadog does (which is admittedly where I'm basing most of my alerting opinions), we might want to include a thumbnail of a graph, a different color depending on how far the metric has crossed over the threshold, links to the metric explorer, several other things that would be difficult to build a user-facing UI to customize.

That level of complexity could benefit emails too, if we want to start sending rich HTML.

IMO there's a large subset of action types which are basically, in some way, shape, or form, "send an alert message." Whether it's a server log, an email, a Slack message, a PagerDuty message, we can cover most bases with:

Let the user edit a plain text Title and a Message with reasonable defaults and enable some {{context variables}}
- Title covers email subject, Slack block heading, etc.
Have the alert type handle styling, formatting, rich features, and non-trivial information.
- For server logs, this just means generating a text string explaining what happened in the alert
- For Slack messages and emails, we can decide to convey some of this information with graphics instead of the same text string

On the other hand, there are some action types that don't fit the bill of "send an alert message," like creating a Github issue in response to an alert. That's something a little more complicated that I don't have a frame of reference for.

elastic / kibana

[Alerting] Add a required, programmatic message to actions #64349

Summary

Context