elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.61k stars 8.22k forks source link

[Response Ops][Alerting] Create FAAD API for use by rule type executors #145103

Closed ymao1 closed 1 year ago

ymao1 commented 1 year ago

As part of Phase 2 of framework alerts-as-data, we need to provide an API for rule type executors to report alerts that will be written out as FAAD documents.

POC for possible implementation here. This issue would cover the AlertsClient portion of the POC.

Rule type executors will have to opt into using the new API, which should provide the same services as the existing AlertsFactory (i.e., recovered alerts determination, alert limit checking, categorizing into new/active/recovered alerts, etc) as well as writing out alert documents. The existing AlertsFactory should be deprecated but not removed.

elasticmachine commented 1 year ago

Pinging @elastic/response-ops (Team:ResponseOps)

ymao1 commented 1 year ago

This is a large issue so it can be started but a complete implementation may be blocked by https://github.com/elastic/kibana/issues/145100

ymao1 commented 1 year ago

This issue will need to be broken down into multiple steps to ensure that the new API has the same functionality as the existing implementation. While planning for this issue, I found it useful to encapsulate all alert related functionality in the alerting task runner into a LegacyAlertsClient. https://github.com/elastic/kibana/pull/148751

Here's a recommendation for how we can break this down:

1. Create new AlertsClient that works with alerts-as-data

This new client should eventually replace the LegacyAlertsClient so should contain all the current framework functionality:

In addition, the client should check to ensure the context-specific resources have been installed prior to rule execution and retry installation if they have not been.

In additional addition, we should try to proxy the AlertsClient with the LegacyAlertsClient so that if a rule type has registered an alert context with the framework but has not yet updated the executor to use the new AlertsClient, FAAD docs will still be written with just the common framework-level fields.

It may be helpful to split the flapping portion out from this step so we can ensure things work as expected with the FAAD documents. For example, we now store recovered alert history information in the task manager state to support flapping detection. What is the equivalent in the FAAD doc?

2. Update action scheduling to work with FAAD

Currently, action scheduling is handled by the ExecutionHandler class which takes active and recovered (legacy) alerts that context and state information. Updates will need to be made so:

3. Update AlertsClient to perform lifecycle executor functionality

Finally, in order to deprecate the lifecycle executor from the rule registry, we need to absorb the functionality currently provided by that executor. Some of that functionality is already duplicating what the framework does (for example, recovery calculation) but some of it is not available in the framework.

ymao1 commented 1 year ago

After some discussion, we are going to take a more incremental approach to creating this new FAAD API. For the first step, we will be creating an AlertsClient that proxies much of the functionality of the LegacyAlertsClient.

We will:

The new AlertsClient:

We will create (many) followup issues for the subsequent steps after this initial issue is complete.

ymao1 commented 1 year ago

While working on the PR for this issue , I found myself blocked by this issue for moving alert UUID generation to the framework. That issue was put on hold as not needed but we will be reviving it and getting that resolved before moving forward with the PR for this issue.

ymao1 commented 1 year ago

Closing in favor of https://github.com/elastic/kibana/issues/156442 and https://github.com/elastic/kibana/issues/156443