Closed banderror closed 17 hours ago
Pinging @elastic/fleet (Team:Fleet)
Pinging @elastic/security-detections-response (Team:Detections and Resp)
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)
Pinging @elastic/security-solution (Team: SecuritySolution)
Hey @kpollich, here's the ticket we promised earlier. @xcrzx is going to switch to it next week (week of July 8). Could we please find someone to actively assist with it from the Fleet side (be available for questions, pair programming, code review, etc)?
Update from @kpollich:
We do not have the streams based approach captured in https://github.com/elastic/kibana/issues/187646 scheduled for development at the moment. I think the best approach at this point would be to implement this as a one-off for the security detection engine integration (with potential to "allow list" the streams approach for other integrations if the need arises).
I don't think this will be trivial to implement broadly for all integrations, as there are complexities around installing things like ML detections, transforms, and soon SLO's (which essentially "wrap" a bunch of other under-the-hood assets), so I fear having streaming on top of all those more complex asset types will be a massive undertaking. For content-only packages, streaming probably makes sense as the default approach but we've only just landed the package-spec support for these package and there's still a ways to go before we can start leveraging that broadly: https://github.com/elastic/package-spec/issues/351.
If the generic solution isn't expected to be released by mid to late October, our team will need to start working on an alternative solution as early as mid-September.
Let's just commit to this explicitly today and plan accordingly: there won't be a generic solution available by late October.
Update from @xcrzx:
So I think the plan is for us to start working on an alternative package installation approach in September. I can begin this after returning from my PTO on September 16th. Here's the approach I plan to take:
- Introduce a new endpoint in Security Solution for detection rule installation or reuse the existing bootstrap endpoint. The key point is that the implementation will be entirely on the Security Solution side.
- Copy the existing package installation logic from Fleet and strip out all code not related to saved object installation.
- Rewrite the saved object installation process, switching from savedObject.import to savedObject.bulkCreate for better memory efficiency.
- Implement incremental saved object installation without deleting existing objects.
- Add Stream Support
This is a rough outline. An important note here is that I'll be using the EPR API directly to fetch package information and download package content (or read from disk if it's prebundled). To ensure compatibility with Fleet, I'll reuse the package saved object type, so even if the package is installed through the Security Solution endpoint, it will still be visible in the Integrations UI. The detection rules package will remain installable and upgradeable via Fleet's UI, but this will not be the recommended method. In Security Solution, we'll exclusively use the new installation endpoint.
Thank you both. With that, I'm removing the 8.16 target from this one. We'll be working on the optimized package installation within a separate ticket https://github.com/elastic/kibana/issues/192350.
@xcrzx ended up implementing a server-side programmatic API for stream-based package installation in the fleet plugin as part of https://github.com/elastic/kibana/issues/192350. I think we can close this one.
Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168
Summary
Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the
security_detection_engine
Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time. We need a fundamental solution to this problem in Fleet itself.Our idea is to introduce a stream-based API for installing Fleet packages:
PackageClient
) available for Security Solution on the server side, and not available to Kibana users via HTTP. Security Solution would wrap this API with its own HTTP API endpoint for installation of thesecurity_detection_engine
package.We hope this solution would help us prevent spikes in memory usage when installing the
security_detection_engine
package.Details
This is where/how Security Solution installs the package on the server side:
https://github.com/elastic/kibana/blob/1040bae64087e2d8fb6a4ef77b93b731b74b8d27/x-pack/plugins/security_solution/server/lib/detection_engine/prebuilt_rules/api/install_prebuilt_rules_and_timelines/install_prebuilt_rules_package.ts#L38-L41
The corresponding method of the
PackageClient
is:https://github.com/elastic/kibana/blob/1040bae64087e2d8fb6a4ef77b93b731b74b8d27/x-pack/plugins/fleet/server/services/epm/package_service.ts#L71-L76
We would need a stream-based alternative of the
ensureInstalledPackage
method.It could be done via adding an option to the existing method:
Or via adding a new method: