elastic / package-spec

EPR package specifications
Other
18 stars 73 forks source link

Support "content-only" integrations #351

Closed jsoriano closed 2 months ago

jsoriano commented 2 years ago

After a conversation about #346 with @ruflin we came to the conclusion that we may want to have a package type for additional content, something as DLCs for integration packages.

This package type would only contain assets, and references to an integration package. Packages of this type could be only installed if the referenced integration package is installed.

Use cases for these packages:

Data streams, or anything that alters ingestion of data would be excluded from these packages, to avoid complex dependencies and interactions.

Some notes about this:

Tasks

felixbarny commented 2 years ago

with a dependency to the integration package that collects the data

Just the ability to define such a dependency would be a very useful feature on its own. We have a similar use case in APM, where we would like a Java attacher integration to be able to declare a dependency on the APM integration.

cc @eyalkoren @Mpdreamz

jsoriano commented 2 years ago

Java attacher integration

What kind of content would these attacher integrations contain? For this proposal I am excluding in principle anything related to data ingestion.

Also, we don't want to open the door to dependencies in a general sense, the dependencies here would be only between the content package and the package it extends.

Mpdreamz commented 2 years ago

In our case it would be a binary that elastic-agent needs to supervise. See https://github.com/elastic/ingest-dev/issues/982 for a discussion on why we feel this responsibility should not fall on the main package.

Also, we don't want to open the door to dependencies in a general sense, the dependencies here would be only between the content package and the package it extends.

++ fully agree, package management is hard :smile:

That's why the design doc our feature request around this topic explicitly named them subpackages. A sub package should only ever belong to one main package.

jsoriano commented 2 years ago

Well, I think that distributing binaries that elastic-agent executes is a different and broader discussion :slightly_smiling_face: I would leave this out of this proposal.

I guess that packages with collector binaries will also need to include information, like mappings, about the additional data these binaries collect, this is something I am explicitly excluding here.

Also the dependencies direction of this kind of packages could be different. If we have packages to distribute collector binaries, we could have a "collector" package for Metricbeat, and all integration packages with metrics collected by Metricbeat would depend on it.

ruflin commented 1 year ago

This topic popped up in the context of a potential stream command in elastic-package that allows you to ship synthetics data for any package in the registry to your cluster: https://github.com/elastic/elastic-package/issues/1541 The problem, the files under _dev are not part of the package served by the registry, same for testdata directories and other. These files are removed during the build process (couldn't find the code quickly) which makes sense as it would increase the package size.

The simplest scenario I could see here is that during the publishing process, there is a elastic-package build and and elastic-package build -raw that copies over all the files. This would result in a package like kubernetes-1.55.0-raw.zip which would could also be downloaded from the registry but is not directly used by Fleet itself (at least not by default) but is useful for development.

jsoriano commented 1 year ago

The simplest scenario I could see here is that during the publishing process, there is a elastic-package build and and elastic-package build -raw that copies over all the files. This would result in a package like kubernetes-1.55.0-raw.zip which would could also be downloaded from the registry but is not directly used by Fleet itself (at least not by default) but is useful for development.

I think it could make sense to build and publish source packages, but I would consider this a different issue. This is a common practice in other open source packaging systems, I have created an issue to support that: https://github.com/elastic/elastic-package/issues/1577

Another option would be to include the build information, something we also want to do (https://github.com/elastic/package-spec/issues/446). This would allow to find the source files in the source repository, but would indeed be an additional step.

kpollich commented 5 months ago

Bumping this as it'll be quite important as part of the OTel integrations project, but also for things like https://github.com/elastic/integrations/tree/main/packages/security_detection_engine which has its own set of issues with installation due to its size. Having it designated as a content-only integration and going through a different installation mechanism more optimized for large integrations with many assets will be a positive change overall for Fleet's memory footprint and the overall UX of integrations like this.

kpollich commented 5 months ago

Tweaking the title + labels so this appears properly on the OTel board as a milestone

jsoriano commented 5 months ago

Thinking on the OTEL use case, we should probably relax the restrictions on dependencies with integration packages. With OTEL there may be no integration package that collects the data.

kpollich commented 4 months ago

We should look to implement a separate installation path in Kibana that's optimized for content-only integrations. We already have a use case for this with large packages like https://github.com/elastic/integrations/tree/main/packages/security_detection_engine that run into memory issues when bulk deleting/importing assets during the existing installation process.

@xcrzx has been doing great work on improving the memory pressure during package installation for the rules package here https://github.com/elastic/kibana/issues/187969, but these are only incremental improvements that don't necessarily address the root cause in the long term.

For content-only integrations, I think we could optimize Fleet's installation code to avoid potentially expensive operations in a situation where a package has many assets.

jsoriano commented 3 months ago

Initial definition for content packages merged in the spec https://github.com/elastic/package-spec/pull/777. Planned to be released as beta in 3.4.0.

Next steps will be to prepare support in elastic-package and the Package Registry. And eventually add support to distribute more kinds of assets and resources.

kpollich commented 2 months ago

I think we can consider this done because the UI work is being tracked separately in https://github.com/elastic/kibana/issues/192484.

Support for discovery features in package-registry.

@mrodm is there anything left to do here to expose the discovery properties in EPR? An issue to implement discovery in Kibana is part of the requirements in the above UI issue, and I don't think we have anything left to implement here.

Next steps will be to prepare support in elastic-package and the Package Registry. And eventually add support to distribute more kinds of assets and resources.

I created https://github.com/elastic/package-spec/issues/803 as a follow-up to support more asset types.

With follow-up issues created for the remaining scope I'm closing this. Our initial support for content packages is implemented and well tested. Thanks all!

mrodm commented 2 months ago

Support for discovery features in package-registry.

@mrodm is there anything left to do here to expose the discovery properties in EPR? An issue to implement discovery in Kibana is part of the requirements in the above UI issue, and I don't think we have anything left to implement here.

@kpollich It would be missing to add support in Elastic Package Registry to show/search packages according to discovery features. Support requests in EPR like

GET /search?discovery=fields:process.pid,user.id

to return packages that can leverage documents that include the process.pid or the user.id fields

cc @jsoriano

kpollich commented 2 months ago

Got it - thanks for clarifying + creating that issue, Mario. I'm not sure yet what the priority is on the discovery feature here, but we will clarify that soon 🙂

mrodm commented 2 months ago

Got it - thanks for clarifying + creating that issue, Mario. I'm not sure yet what the priority is on the discovery feature here, but we will clarify that soon 🙂

@kpollich just created an issue for that https://github.com/elastic/package-registry/issues/1229