elastic / package-spec

EPR package specifications
Other
17 stars 70 forks source link

[Change Proposal] Reserve `package` keyword for data stream/dataset values #699

Closed kpollich closed 7 months ago

kpollich commented 7 months ago

Ref https://github.com/elastic/kibana/issues/175254

As of 8.12.1, Fleet defines a custom ingest pipeline processor for all data streams of the form ${type}-${integration}.package@custom as well as a processor of the form ${type}-${integration}-${dataset}@custom. So, it is possible to create a datastream of the name package and incur a collision of these pipeline names, resulting in expected ingestion behavior in cases where a custom pipeline is in use.

I'm proposing we forbid the usage of package as the data stream directory name as well as the dataset property in a data stream manifest file. This will guarantee a package can't ship with this potential collision in place.

kpollich commented 7 months ago

There is one instance of the name package being used for a data stream in the system audit package

https://github.com/elastic/integrations/blob/main/packages/system_audit/data_stream/package/manifest.yml

This means we do wind up with a collision when Fleet generates custom pipeline names, e.g.

{
  "pipeline": {
    "name": "logs-system_audit.package@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `logs` defined by the `system_audit` integration"
  }
},
{
  "pipeline": {
    "name": "logs-system_audit.package@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `system_audit.package` dataset"
  }
}

@elastic/security-external-integrations - Is there an explicit motivation for the naming of this data stream as logs-system_audit.package, or is there simply not a specific name relevant to this particular use case?

kpollich commented 7 months ago

One way we can move forward on the Fleet side is to just use .integration for the suffix instead, and make this issue about reserving the word integration rather than package. We won't have to ship a change to that existing data stream this way.

jsoriano commented 7 months ago

One way we can move forward on the Fleet side is to just use .integration for the suffix instead, and make this issue about reserving the word integration rather than package. We won't have to ship a change to that existing data stream this way.

Could we use other form for the name of these ingest pipelines so they don't collide? For example ${type}-${integration}-package@custom and ${type}-${integration}.${dataset}@custom, or ${type}-${integration}@package@custom and ${type}-${integration}.${dataset}@custom

kpollich commented 7 months ago

I think we should avoid adding a suffix to the dataset level pipelines, as they've been supported for quite a while now and we don't have a clean way to migrate users onto the newly named pipelines. We did wind up shipping the .integration suffix, so maybe we can explore forbidding that keyword instead.

It might be good to think about changing the pipeline names to guarantee no collisions as part of a 9.0 breaking change, though.