[Synthetics] Alert Triggering at Point of Test Failure

drewpost commented 1 year ago

For the synthetics use case, customers require notification of issues with their monitors as close to instantaneously as possible. The current alerting framework available to the app only allows alerting via a look-back window. This delays sending an alert and impacts our end users' experience.

We want to be able to trigger an alert being sent with an event vs a time-based look back window. This would enable the following user flow: Synthetic Monitor executes a schedule or ad-hoc test run -> the test run fails -> the monitor moves into an error state and a new error event is opened -> an alert is triggered. There should be the absolute minimum amount of time required between each step of that flow.

elasticmachine commented 1 year ago

Pinging @elastic/uptime (Team:uptime)

andrewvc commented 10 months ago

After going through the options here with @kobelb we have two options:

Decrease the poll interval from every 60s to every 30s in the current alert
Find a way to immediately trigger an alert when an error occurs

Practically speaking option 1 is far more feasible in less time. Option 2 would require rethinking good chunks of our fleet integration, the synthetics service and kibana alerting.

One barrier to 1. is that currently serverless only supports a 60s minimum schedule, but they could make an exception for this alert.

elasticmachine commented 2 months ago

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

elastic / kibana

[Synthetics] Alert Triggering at Point of Test Failure #149227