Closed approksiu closed 6 months ago
Pinging @elastic/security-detections-response (Team:Detections and Resp)
Pinging @elastic/security-solution (Team: SecuritySolution)
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)
@approksiu cc @banderror @nikitaindik @dplumlee @joepeeples
concurrent_searches
and items_per_search
are not defaultable fields, so when a IM rule is created these fields are not initialised if not passed explicitly. However, when the rule runs, the values default to the following numbers if undefined in the rule:
concurrent_searches: 1
items_per_search: 9000
It looks like, from the Acceptance Criteria, we will like to make these fields defaultable to those values, and use those default values in the Rule Creation UI as well. Sounds right?
Regarding the tooltip/explanation in UI/docs, there's no reference to the fields in the docs that I can find, and I feel we need to explain pretty clearly how these fields work; they could otherwise lead to degraded performance and many SDHs, if wrongly configured.
In this discussion I found a pretty good explanation of how the fields work. Hopefully @joepeeples can help us find a way of explaining this clearly and succintly.
Hi @rylnd @nkhristinin
As preparation for the Prebuilt Rules Customization Milestone 3, we'll be working on making the concurrent_searches
and items_per_search
fields of IM rules editable via the UI.
Allowing users to edit this, we think, opens up a lot of surface area for misconfigurations of the rule, performance issues and the possibility of many SDHs. Therefore, we'd like to ask the Detection Engine team, as owners of the execution of the rule, if you could provide us with some guidance, recommendations of reasonable ways to limit the editing of this values.
This discussion dives into how the fields work, but hopefully could give us your opinion on minimum/maximum values, or ranges of values that should be valid, maybe even depending on what the other value is.
We are seeing two ways of enforcing this limits:
What do you think?
Thanks for your help. cc @dplumlee who will be working on this feature
@jpdjere In my opinion it's best to stick to objective statements and examples rather than trying to be prescriptive. The issue with addressing a specific scenario (e.g. "If your rule is timing out but resource utilization is low,") is that it's very fuzzy (there are lots of definitions for "timing out" and "low") and subjective, and there are lots of permutations that we really can't/shouldn't try to enumerate.
Instead, I would objectively describe the configuration, and provide some illustrative examples of what changing that configuration would do, quantitatively. Most of that is already written up as Indicator Match Troubleshooting, so I think it would just be a matter of converting that to user-facing information.
@jpdjere if we end up not putting any hard limits on these and instead are adding a lot of description surrounding what can happen if you start messing with these rule configurations, perhaps we can just display those warnings in the form if the numbers are set above the default values. I see we already have minimum values of 1 for each of the fields defined in the zod schema validation.
Thanks @rylnd!
So @dplumlee, let's ping @approksiu and @ARWNightingale for their approval here: based on Ryland's comment, we would add edit capability to these two features without setting a max value for them on the API or serverside layers.
Instead, we'll provide as much user-facing information on the form as possible. The KB link provided above has definitely good info:
items_per_search: Defines how many indicators we load into memory at a time. Defaults to 9000. A larger number will increase memory usage and the size (in characters) of the elasticsearch query, while reducing the total number of queries performed.
concurrent_searches: Defines how many parallel/concurrent searches are performed at a time. Defaults to 1. A larger number will increase the number of searches performed while reducing the size of each individual query. As an additional guideline, it is recommended that one follows the equation described in the analogous max_concurrent_searches parameter to determine an appropriate value.
Let us know what you think.
@dplumlee Kseniia is away this week, so I'd say we can start development while we reach a decision on this.
My concern, that concurrent_searches
can be misleading for users.
From the reading description above I can have idea that increase in concurrent_searches
should lead to faster rule execution, as we do searches in parallel.
In real life it can be far of true, but probable lead to slower rule execution.
To understand why, we need to know how IM rule works.
We do have 2 indices indexA and indexB.
The rule execution is a loop, when we try to query all documents from one index until we find matches.
We query batch of documents from indexB. Usually 9k, or (per_page which we show later how is calculated)
From each batch, we create a speical query into indexA to find matches with documents from the indexB.
if we find 100 alerts, or indexB has no more documents to query we stop, otherwise go to step 1.
The main confusion can be that with concurrent_searches
we can lead users to think that we parallel step 1, but un reality we parallel step 2.
Also in the code we have this line
const perPage = concurrentSearches * itemsPerSearch;
perPage - is amount of documents we are query. But with current implementation if this value more than 10k, we will have an error.
So we should, probably make some validation in UI, to be sure that it less than 10k.
@joepeeples @ARWNightingale I started building this out and was wondering what y'all's opinion was on how we display any help text and the relevant errors - specifically addressing what Nikita talks about above with the concurrentSearches * itemsPerSearch
value having to be less than 10k.
Help Text
Do we want any help text below these? In the design it lists the maximum values for each but as we've discussed elsewhere in this ticket, the API won't have a max value on either one other than the 10k multiplied value.
Errors
Right now I have the same error displaying below both components as it affects both of them but it looks a bit funky imo. The language could obviously be improved with help from @joepeeples, @ARWNightingale do you think a unified error message for this would be a better solution?
I have the minimum validation added on each as well, this is what it looks like.
@jpdjere re: the concurrentSearches * itemsPerSearch
cannot be greater than 10k validation, what do you think the best way to handle that is for separate defaultable fields? Should we add in validation outside of zod to all the routes that use the respective field OR its default value to calculate whether or not a request body is valid? (e.g. concurrent_searches ?? 1 * items_per_search ?? 9000 <= 10000
)
This would cover both the case where a user inputs both fields and the multiplied value is higher than 10k and the case where they only enter one of the fields, but the multiplied value with what we default to is still over 10k.
EDIT: turns out we currently validate on rule creation (and only rule creation) to have both of them in the request body or neither of them. Since we're changing these fields to be "defaultable" in this PR, we could either stick with this logic and have them both be required in the front end, or we could change to let them both be defaultable separately and follow what I was saying above.
@dplumlee @jpdjere could we put together a design doc (could just be a google doc or github issue) for opening up these fields? The "design" part being more a layout of the anticipated risks and any potential mitigations. Performance is a big concern here - IM rules already often encounter issues around this - particularly in Serverless. If we open these fields up and start getting SDHs from users who play around with this - will there be anything being logged to alert us that these value have been modified from their default values for easy triage?
We have previously received feedback that we offer users too many ways to get themselves in bad configurations, and I'm afraid that's the case here. From Nikita's concerns - it seems users would need some level of technical knowledge about the workings of the rule to properly configure these. As we open up fields like these and max_signals - should there be a new section in the rule creation/edit flow that makes it clear these fields should only rarely be adjusted and they do so at the risk of negatively affecting performance? Something more than just an min/max validation.
cc @marshallmain for technical input
We should have some product input here @approksiu. We can discuss this in the next Simplified Protections meeting, but for context:
What do you think?
Closing ticket after discussion with @approksiu and Detection Engine team:
For all of these reasons, we've decided not to expose the fields in the UI. For the Prebuilt Rules Customization epic, we decided that during rule upgrade workflow, we will just save to the updated rule whatever value is in the user's current rule. (See ticket updated accordingly)
FYI @dplumlee
Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168
Summary
We need to expose the
concurrent_searches
anditems_per_search
fields of Indicator Match rules on the Rule Creation/Editing pages and allow editing them. These should be fields with default values in the UI, which should correspond to the default values that are currently used in the API if the fields are not passed in rule create/update requests.Background
We want to allow users to easily specify the prerequisites for their custom rules.
User story
concurrent_searches
anditems_per_search
fields for my custom Indicator Match rules.Acceptance criteria
Question: what values would these fields contain - what UI component would make sense?
Designs
Figma file
Release progress
Planned release date in Serverless: TBD. Planned release date in ESS: TBD (
v8.x.0
).