[Discuss] [Security Solution] [Alerting] HTTP route RFC for unified rule management

dhurley14 commented 3 years ago

Introduction

RFC to define the needs of the security solution for a unified rule management API. Not sure if something like this exists elsewhere but figured I'd throw this together.

Security routes to be added to alerting framework

We make use of bulk actions https://github.com/elastic/kibana/issues/53144 such as bulk create, bulk update (bulk enable essentially), bulk delete.. so we need some new routes to handle these operations. These routes also require creating additional apis on the alerts client to enable bulk operations on the saved objects which currently (as of 7.12 I believe) have all of these operations available except for bulk delete.

We also need import export ~but it looks like @ymao1 has been working away at that which is great :) https://github.com/elastic/kibana/issues/94151 and https://github.com/elastic/kibana/issues/94152~

Privileges and index bootstrap routes are used for establishing the .siem-signals index which will be the unified alerting index. I think Garrett is handling this though but if anyone has additional comments those would be welcome.

Prepackaged rules route - Not sure how this will look in 7.13 after this lands

Find rules status route is undocumented but maybe should be documented? Also do we want this in the security solution routes or in alerting? My feeling is it belongs in the security solution but I could see utility in having it be available for other solutions too. This could also exist in the get framework health route from alerting.

We also need a route for updating tags but I'm not sure if tags are security-specific so this route could remain in the security solution.

Lastly, new routes for adding actions and enabling / disabling our pre-built rules (tagged with __immutable__) Also add validation to prevent updating immutable rules too.

Route validation

Currently alerting is utilizing kbn config for route validation. If possible, it would be great if the rule management api and the solutions utilized the same validation library. My suggestion would be to introduce io-ts in the newer apis like import / export, bulk etc. and then gradually update the other routes to utilize it as well.

Open questions

We currently merge the rule status saved object with the rule data in our "find_rules" route. Hopefully this status saved object will be removed in 7.13 and be replaced with events in the event log plugin. Regardless, the _find route returns the rule with some current status information merged into the response. Either we copy this functionality over into the _find route of alerting or we tell the UI to query the rule health / rule status api. The easier option is to have the same functionality as currently exists in the security solution route.

elasticmachine commented 3 years ago

Pinging @elastic/security-detections-response (Team:Detections and Resp)

ymao1 commented 3 years ago

The initial implementation of import/export for the alerting framework will be done via the saved object management UI, so I'm not sure it will exactly meet your needs in this iteration. We are addressing the need for importing/exporting rules and connectors in order to support the deprecation of legacy multi-tenancy, so the expected workflow will be that a user will use the saved object management UI or the saved object import/export routes/APIs. In the next step, we would expect to provide routes/APIs in the alerting framework for import/export that provide an RBAC layer on top of the saved objects API but there is no issue open for that right now.

elasticmachine commented 3 years ago

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

banderror commented 3 years ago

Just for reference, these are our current endpoints and route handlers:

I think one thing to note is that our routes are not simple wrappers around AlertsClient and data specific to only alerting. They contain pieces of code specific to the Detections domain, for example:

ML https://github.com/elastic/kibana/blob/041566d85aaf24f61bc4304cbd978f0aa00ea9d7/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/create_rules_route.ts#L72
Checks for existence of indices specified in rule params https://github.com/elastic/kibana/blob/041566d85aaf24f61bc4304cbd978f0aa00ea9d7/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/create_rules_bulk_route.ts#L95
Updating our custom saved objects representing "throttled" rule actions https://github.com/elastic/kibana/blob/041566d85aaf24f61bc4304cbd978f0aa00ea9d7/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/create_rules_bulk_route.ts#L111
Creating exceptions list for Elastic Endpoint rule https://github.com/elastic/kibana/blob/041566d85aaf24f61bc4304cbd978f0aa00ea9d7/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/create_rules_route.ts#L92
etc etc

So the big question imho is how are we going to separate responsibilities and design the final framework (whatever it is). Is it going to be only Alerting, or Alerting + ETL on top, or Alerting + unified Detection Engine, or... ? That said, if/when this is more clarified, would be definitely good to consolidate HTTP APIs for rules, but at this point it seems like too early to make any decisions on routes.

Maybe it would be worth going through all our routes and figure out all the generic (alerting) and specific (detections) parts.

Another idea is let's start with adding some missing APIs to alerting (methods to AlertsClient etc) that we've been needing for a long time. This is a meta ticket.

I think bulk CRUD methods would be a good thing to add because of 2 reasons:

There's a lot of bulk CRUD code in our API routes, and sometimes it's implemented like await Promise.all() or something like that which doesn't scale with the number of items to process, and we already had issues with that. We would benefit from extracting this mechanics to Alerting and implementing it properly.
We will need bulk CRUD stuff after my efforts with Event Log are done -- in order to optimize our rules/_find and rules/_find_statuses endpoints.

Lastly, I was thinking about leveraging data plugin for creating an RPC infrastructure -- just like they do it in Threat Hunting. They can add a server-side function and call it from the client app -- without introducing new HTTP endpoints. HTTP endpoints as they are now are public (I'd say even "published"), meaning that when we add a new endpoint we agree on maintaining it and avoiding breaking changes untill the next major. This is hard, and would be good to have an option to bypass this in those situations where an app needs a new app-specific "endpoint", but it's not generic/useful enough to publish it for all users of Kibana. This is not super related to this RFC, but I'd still like to mention this idea here.

ymao1 commented 3 years ago

Addressing the immediate ask of bulk rule actions:

Bulk create/delete - Currently single create and delete functions will throw an error if any step of the create/delete process fails. Assume the bulk version of these functions should not throw an error if any one operation fails, rather it should return an array of responses, either success or errors with reason?
Bulk get - This was not mentioned in the issue description - is that because it's not needed or because it's a lower priority?
Bulk update - This was only mentioned in the context of bulk enabling/disabling in the issue description. Is this because we are considering rules to be immutable and therefore will never be updating them? Is the ask then that there's a bulkEnable and bulkDisable function on the alerts client? Note that currently, enabling/disabling a rule updates the updatedBy and updatedAt fields for a rule. Does that violate rule immutability?

Agree with @banderror that there is a lot of security specific functionality in the security routes outside of purely calling the alerts client so providing these bulk actions will not immediately remove the need for the security routes since there are additional. Out of the additional actions that are performed before and after calling the alerts client, do we know which ones might be needed by observability as well? That might help guide the alerting framework as to what might belong in the alert client.

ymao1 commented 3 years ago

Find rules status route is undocumented but maybe should be documented? Also do we want this in the security solution routes or in alerting? My feeling is it belongs in the security solution but I could see utility in having it be available for other solutions too. This could also exist in the get framework health route from alerting.

Are you suggesting that the rules with error status should be returned as part of the /api/alerting/_health endpoint in addition to whether there are any rules in an error state?

Lastly, new routes for adding actions

Is this route requested because security creates a separate rule for notifications instead of including the actions as part of rule creation? What would be the requirements of this new route?

gmmorris commented 3 years ago

Find rules status route is undocumented but maybe should be documented?

I believe this omission is intentional - this is an API we consider internal, open to change and hence not on the public docs. While we do want other teams in Elastic to use it, as we can work closely with them when a change is needed, we don't want this on external docs. @mikecote can confirm whether I'm right about that. 😬

The Platform leads are in the process of formalising clearer guidance around the distinction between internal/external apis and what manner of documentation we should provide for each. Hopefully this'll be clearer once we finish that work 😄 . (cc @kobelb )

ymao1 commented 3 years ago

Find rules status route is undocumented but maybe should be documented?

I believe this omission is intentional - this is an API we consider internal, open to change and hence not on the public docs.

I thought this was in reference to the find rule status route inside security solutions and whether this should be lifted up into the alerting framework?

gmmorris commented 3 years ago

Find rules status route is undocumented but maybe should be documented?

I believe this omission is intentional - this is an API we consider internal, open to change and hence not on the public docs.

I thought this was in reference to the find rule status route inside security solutions and whether this should be lifted up into the alerting framework?

Oh, sorry, probably my bad. 😬 Ignore me 😅

dontcallmesherryli commented 3 years ago

@oatkiller assigning this to you to start grooming/planning for this ticket.

elastic / kibana