elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.68k stars 8.23k forks source link

[Fleet] Stream-based programmatic API for installing packages #187646

Closed banderror closed 17 hours ago

banderror commented 4 months ago

Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168

Summary

Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the security_detection_engine Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time. We need a fundamental solution to this problem in Fleet itself.

Our idea is to introduce a stream-based API for installing Fleet packages:

We hope this solution would help us prevent spikes in memory usage when installing the security_detection_engine package.

Details

This is where/how Security Solution installs the package on the server side:

https://github.com/elastic/kibana/blob/1040bae64087e2d8fb6a4ef77b93b731b74b8d27/x-pack/plugins/security_solution/server/lib/detection_engine/prebuilt_rules/api/install_prebuilt_rules_and_timelines/install_prebuilt_rules_package.ts#L38-L41

The corresponding method of the PackageClient is:

https://github.com/elastic/kibana/blob/1040bae64087e2d8fb6a4ef77b93b731b74b8d27/x-pack/plugins/fleet/server/services/epm/package_service.ts#L71-L76

We would need a stream-based alternative of the ensureInstalledPackage method.

It could be done via adding an option to the existing method:

  ensureInstalledPackage(options: {
    pkgName: string;
    pkgVersion?: string;
    spaceId?: string;
    force?: boolean;
    stream?: boolean; // <-- NEW OPTION, by default is false
  }): Promise<Installation>;

Or via adding a new method:

  ensureInstalledPackageInStreamMode(options: {
    pkgName: string;
    pkgVersion?: string;
    spaceId?: string;
    force?: boolean;
  }): Promise<Installation>;
elasticmachine commented 4 months ago

Pinging @elastic/fleet (Team:Fleet)

elasticmachine commented 4 months ago

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine commented 4 months ago

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

elasticmachine commented 4 months ago

Pinging @elastic/security-solution (Team: SecuritySolution)

banderror commented 4 months ago

Hey @kpollich, here's the ticket we promised earlier. @xcrzx is going to switch to it next week (week of July 8). Could we please find someone to actively assist with it from the Fleet side (be available for questions, pair programming, code review, etc)?

banderror commented 2 months ago

Update from @kpollich:

We do not have the streams based approach captured in https://github.com/elastic/kibana/issues/187646 scheduled for development at the moment. I think the best approach at this point would be to implement this as a one-off for the security detection engine integration (with potential to "allow list" the streams approach for other integrations if the need arises).

I don't think this will be trivial to implement broadly for all integrations, as there are complexities around installing things like ML detections, transforms, and soon SLO's (which essentially "wrap" a bunch of other under-the-hood assets), so I fear having streaming on top of all those more complex asset types will be a massive undertaking. For content-only packages, streaming probably makes sense as the default approach but we've only just landed the package-spec support for these package and there's still a ways to go before we can start leveraging that broadly: https://github.com/elastic/package-spec/issues/351.

If the generic solution isn't expected to be released by mid to late October, our team will need to start working on an alternative solution as early as mid-September.

Let's just commit to this explicitly today and plan accordingly: there won't be a generic solution available by late October.

Update from @xcrzx:

So I think the plan is for us to start working on an alternative package installation approach in September. I can begin this after returning from my PTO on September 16th. Here's the approach I plan to take:

  1. Introduce a new endpoint in Security Solution for detection rule installation or reuse the existing bootstrap endpoint. The key point is that the implementation will be entirely on the Security Solution side.
  2. Copy the existing package installation logic from Fleet and strip out all code not related to saved object installation.
  3. Rewrite the saved object installation process, switching from savedObject.import to savedObject.bulkCreate for better memory efficiency.
  4. Implement incremental saved object installation without deleting existing objects.
  5. Add Stream Support

This is a rough outline. An important note here is that I'll be using the EPR API directly to fetch package information and download package content (or read from disk if it's prebundled). To ensure compatibility with Fleet, I'll reuse the package saved object type, so even if the package is installed through the Security Solution endpoint, it will still be visible in the Integrations UI. The detection rules package will remain installable and upgradeable via Fleet's UI, but this will not be the recommended method. In Security Solution, we'll exclusively use the new installation endpoint.

Thank you both. With that, I'm removing the 8.16 target from this one. We'll be working on the optimized package installation within a separate ticket https://github.com/elastic/kibana/issues/192350.

banderror commented 17 hours ago

@xcrzx ended up implementing a server-side programmatic API for stream-based package installation in the fleet plugin as part of https://github.com/elastic/kibana/issues/192350. I think we can close this one.