Open banderror opened 4 months ago
Pinging @elastic/security-detections-response (Team:Detections and Resp)
Pinging @elastic/security-solution (Team: SecuritySolution)
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)
As an alternative to being limited to what is included in a single package, can we revisit additional rules being fetched from older EPR packages as needed within the specific workflows? I imagine this mostly applies when diving into a single rule?
rule
-> load ruleload historical
-> iterate EPR and fetch versions
Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168
Summary
Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the
security_detection_engine
Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time, because we won't be able to release Milestone 3 of the prebuilt rule customization feature with the current limit of 2 versions per rule in the package.Before we can release Milestone 3, we will need to increase back the number of versions per rule we ship in the package. In general, the more versions we ship, the better is the UX for upgrading prebuilt rules; the fewer versions we ship, the lighter is the package which also positively affects the UX and increases reliability.
Our goal is to find a balance between reliability and good UX and achieve both. For that, we need to come up with smart and efficient limits for the package with prebuilt rules.
Ideas
Total limits for the package as a whole:
Per rule limits:
<= X
number of versions no matter what. Exclude older versions and keep newer ones.now - X days
.<= X
versions created within last 3 months,<= Y
within last 6 months,<= Z
within last 12 months, etc;X < Y < Z < ...
, e.g.4 < 6 < 7
. Time ranges grow exponentially while limits grow slower than that: logarithmically or linearly.X
of them within aY
time window. E.g. it could be more than2
per each3 months
. This could help prevent some "noisy" rules (in terms of the frequency of updates to them) from "eating" too much space of the package, as well as evicting older versions of a noisy rule by newer versions of the same rule.Todo