[Security Solution] Allow users to edit concurrent_searches and items_per_search field for custom IM rules

approksiu commented 8 months ago

Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168

Summary

We need to expose the concurrent_searches and items_per_search fields of Indicator Match rules on the Rule Creation/Editing pages and allow editing them. These should be fields with default values in the UI, which should correspond to the default values that are currently used in the API if the fields are not passed in rule create/update requests.

Background

We want to allow users to easily specify the prerequisites for their custom rules.

User story

[ ] As a user, I want to be able to specify concurrent_searches and items_per_search fields for my custom Indicator Match rules.

Acceptance criteria

[ ] Rule edit page shows the concurrent_searches and items_persearch field in the About Rule > advanced settings section for Indicator match rules.
[ ] The default setting is present in those fields if applicable.
[ ] User can edit these fields. User is informed about min/max/default values.
[ ] On the rule edit page, there is a tooltip explaining what this field does, (and a link to documentation?).
[ ] When the Elastic prebuilt rule is duplicated - the value should be kept.

Question: what values would these fields contain - what UI component would make sense?

Designs

Figma file

Release progress

[ ] Initial implementation is done. (PR)
[ ] Feature is covered with automated tests.
[ ] Feature is fully implemented and considered by the development team as ready to be released.
[ ] Acceptance testing is done and the feature is approved by @approksiu and @ARWNightingale.
[ ] Exploratory testing is done and the feature is approved by TBD (@dplumlee will find an engineer who will do it).
[ ] Documentation is written for ESS and Serverless by @joepeeples. Two docs PRs are approved and ready to be merged.
[ ] Feature is released in Serverless.

Planned release date in Serverless: TBD. Planned release date in ESS: TBD (v8.x.0).

elasticmachine commented 8 months ago

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine commented 8 months ago

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine commented 8 months ago

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

jpdjere commented 7 months ago

@approksiu cc @banderror @nikitaindik @dplumlee @joepeeples

concurrent_searches and items_per_search are not defaultable fields, so when a IM rule is created these fields are not initialised if not passed explicitly. However, when the rule runs, the values default to the following numbers if undefined in the rule:

concurrent_searches: 1
items_per_search: 9000

It looks like, from the Acceptance Criteria, we will like to make these fields defaultable to those values, and use those default values in the Rule Creation UI as well. Sounds right?

Regarding the tooltip/explanation in UI/docs, there's no reference to the fields in the docs that I can find, and I feel we need to explain pretty clearly how these fields work; they could otherwise lead to degraded performance and many SDHs, if wrongly configured.

In this discussion I found a pretty good explanation of how the fields work. Hopefully @joepeeples can help us find a way of explaining this clearly and succintly.

jpdjere commented 6 months ago

Hi @rylnd @nkhristinin

As preparation for the Prebuilt Rules Customization Milestone 3, we'll be working on making the concurrent_searches and items_per_search fields of IM rules editable via the UI.

Allowing users to edit this, we think, opens up a lot of surface area for misconfigurations of the rule, performance issues and the possibility of many SDHs. Therefore, we'd like to ask the Detection Engine team, as owners of the execution of the rule, if you could provide us with some guidance, recommendations of reasonable ways to limit the editing of this values.

This discussion dives into how the fields work, but hopefully could give us your opinion on minimum/maximum values, or ranges of values that should be valid, maybe even depending on what the other value is.

We are seeing two ways of enforcing this limits:

Using Zod validation in the requests, simply failing creation/editing requests with 400 when the number is invalid.
Accepting any value, but, if invalid, replacing it with a default minimum/maximum in the request handler.

What do you think?

Thanks for your help. cc @dplumlee who will be working on this feature

rylnd commented 6 months ago

@jpdjere In my opinion it's best to stick to objective statements and examples rather than trying to be prescriptive. The issue with addressing a specific scenario (e.g. "If your rule is timing out but resource utilization is low,") is that it's very fuzzy (there are lots of definitions for "timing out" and "low") and subjective, and there are lots of permutations that we really can't/shouldn't try to enumerate.

Instead, I would objectively describe the configuration, and provide some illustrative examples of what changing that configuration would do, quantitatively. Most of that is already written up as Indicator Match Troubleshooting, so I think it would just be a matter of converting that to user-facing information.

dplumlee commented 6 months ago

@jpdjere if we end up not putting any hard limits on these and instead are adding a lot of description surrounding what can happen if you start messing with these rule configurations, perhaps we can just display those warnings in the form if the numbers are set above the default values. I see we already have minimum values of 1 for each of the fields defined in the zod schema validation.

jpdjere commented 6 months ago

Thanks @rylnd!

So @dplumlee, let's ping @approksiu and @ARWNightingale for their approval here: based on Ryland's comment, we would add edit capability to these two features without setting a max value for them on the API or serverside layers.

Instead, we'll provide as much user-facing information on the form as possible. The KB link provided above has definitely good info:

items_per_search: Defines how many indicators we load into memory at a time. Defaults to 9000. A larger number will increase memory usage and the size (in characters) of the elasticsearch query, while reducing the total number of queries performed.

concurrent_searches: Defines how many parallel/concurrent searches are performed at a time. Defaults to 1. A larger number will increase the number of searches performed while reducing the size of each individual query. As an additional guideline, it is recommended that one follows the equation described in the analogous max_concurrent_searches parameter to determine an appropriate value.

Let us know what you think.

@dplumlee Kseniia is away this week, so I'd say we can start development while we reach a decision on this.

nkhristinin commented 6 months ago

My concern, that concurrent_searches can be misleading for users.

From the reading description above I can have idea that increase in concurrent_searches should lead to faster rule execution, as we do searches in parallel.

In real life it can be far of true, but probable lead to slower rule execution.

How IM rule works:

To understand why, we need to know how IM rule works.

We do have 2 indices indexA and indexB.

The rule execution is a loop, when we try to query all documents from one index until we find matches.

We query batch of documents from indexB. Usually 9k, or (per_page which we show later how is calculated)
From each batch, we create a speical query into indexA to find matches with documents from the indexB.
if we find 100 alerts, or indexB has no more documents to query we stop, otherwise go to step 1.

The main confusion can be that with concurrent_searches we can lead users to think that we parallel step 1, but un reality we parallel step 2.

Validation

Also in the code we have this line

 const perPage = concurrentSearches * itemsPerSearch;

perPage - is amount of documents we are query. But with current implementation if this value more than 10k, we will have an error.

So we should, probably make some validation in UI, to be sure that it less than 10k.

dplumlee commented 6 months ago

@joepeeples @ARWNightingale I started building this out and was wondering what y'all's opinion was on how we display any help text and the relevant errors - specifically addressing what Nikita talks about above with the concurrentSearches * itemsPerSearch value having to be less than 10k.

Help Text

Do we want any help text below these? In the design it lists the maximum values for each but as we've discussed elsewhere in this ticket, the API won't have a max value on either one other than the 10k multiplied value.

Errors

Right now I have the same error displaying below both components as it affects both of them but it looks a bit funky imo. The language could obviously be improved with help from @joepeeples, @ARWNightingale do you think a unified error message for this would be a better solution?

I have the minimum validation added on each as well, this is what it looks like.

dplumlee commented 6 months ago

@jpdjere re: the concurrentSearches * itemsPerSearch cannot be greater than 10k validation, what do you think the best way to handle that is for separate defaultable fields? Should we add in validation outside of zod to all the routes that use the respective field OR its default value to calculate whether or not a request body is valid? (e.g. concurrent_searches ?? 1 * items_per_search ?? 9000 <= 10000)

This would cover both the case where a user inputs both fields and the multiplied value is higher than 10k and the case where they only enter one of the fields, but the multiplied value with what we default to is still over 10k.

EDIT: turns out we currently validate on rule creation (and only rule creation) to have both of them in the request body or neither of them. Since we're changing these fields to be "defaultable" in this PR, we could either stick with this logic and have them both be required in the front end, or we could change to let them both be defaultable separately and follow what I was saying above.

yctercero commented 6 months ago

@dplumlee @jpdjere could we put together a design doc (could just be a google doc or github issue) for opening up these fields? The "design" part being more a layout of the anticipated risks and any potential mitigations. Performance is a big concern here - IM rules already often encounter issues around this - particularly in Serverless. If we open these fields up and start getting SDHs from users who play around with this - will there be anything being logged to alert us that these value have been modified from their default values for easy triage?

We have previously received feedback that we offer users too many ways to get themselves in bad configurations, and I'm afraid that's the case here. From Nikita's concerns - it seems users would need some level of technical knowledge about the workings of the rule to properly configure these. As we open up fields like these and max_signals - should there be a new section in the rule creation/edit flow that makes it clear these fields should only rarely be adjusted and they do so at the risk of negatively affecting performance? Something more than just an min/max validation.

cc @marshallmain for technical input

jpdjere commented 6 months ago

We should have some product input here @approksiu. We can discuss this in the next Simplified Protections meeting, but for context:

We need to make this two fields editable in the UI in preparation for Prebuilt Rules Customization. Both of these fields are already customizable via API.
We're dealing with highly technical fields which have a rather complex functionality. Also, they depend highly on one another during rule execution. See comment above.
Modifying these rules gives user's lots of surface area to "shoot themselves in the foot" regarding performance of IM rules. Given the amount of SDHs we've had in the past related to performance of IM rules, this is a situation we'd like to avoid aggravating as much as possible.
From the Rule Management team we were planning making the fields editable adding in the UI enough warnings and explanations about the potential dangers and possible negative effects in performance, but the Detection Engine justifiably raised a flag here about maybe that not being enough. See @yctercero proposal's above.

What do you think?

jpdjere commented 6 months ago

Closing ticket after discussion with @approksiu and Detection Engine team:

Fields are barely used; was introduced principally to assist us with dealing with SDHs
Fields are editable via API but are actually not documented in the API docs; preventing their update would not signify a breaking change
If exposed in the UI or added to API docs, it would be extremely difficult to explain how the fields work to users, and prevent them from "shooting themsleves in the feet".

For all of these reasons, we've decided not to expose the fields in the UI. For the Prebuilt Rules Customization epic, we decided that during rule upgrade workflow, we will just save to the updated rule whatever value is in the user's current rule. (See ticket updated accordingly)

FYI @dplumlee

elastic / kibana