elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
186 stars 387 forks source link

[Milestone 1] Create a versioned findings latest transform with index alias in the integrations repository #10251

Open opauloh opened 3 days ago

opauloh commented 3 days ago

Motivation

As part of namespace support changes we need to update the logs-cloud_security_posture.findings_latest index template in a way that the target index must not have a “constant keyword” for the datastream.namespace property.

As changing mapping so that datastream.namespace is "keyword" instead of "constant_keyword" is a breaking change, we need to have a plan for the migration of the existing indexes.

While working on the 3rd party data ingestion proposal, Max discovered that Threat Intelligence integrations use index versioning and index alias to deal with breaking changes, this ticket proposes us to do something similar.

Also as part of this change, we plan to deprecate the logs-cloud_security_posture.findings_latest-default index, as it's name might imply that's contains only data from the default namespace, but the latest transforms used in our integrations will contain all namespaces.

Also, @maxcold proposed we could use logs-cloud_security_posture.findings_latest_misconfigurations_cdr as an alias for consistency with the proposal for 3rd party data (that uses logs-{integration}.{datastream}_latest_misconfigurations_cdr). So we could use a wildcard query by the index logs-*._latest_misconfiguration_cdr not only for 3rd party data but also for our native data.

Implementation

Given the proposal, we should update our latest transform to include a versioned destination index so we can increment the version number whenever a field is changed to avoid migration errors, while pointing to the same alias to facilitate queries, an example in yml format of how it would be setup in the first time is as follows:

dest:
  index: "logs-cloud_security_posture.findings_latest_misconfigurations_transform-1"
  aliases:
    - alias: "logs-cloud_security_posture.findings_latest_misconfigurations_cdr"
      move_on_creation: true

Note we also have move_on_creation as true, that's important when using versioned transform

move_on_creation: (Optional, boolean) Whether or not the destination index should be the only index in this alias. If true, all the other indices will be removed from this alias before adding the destination index to this alias. Defaults to false.

Definition of done

Out of scope

Related tasks/epics

maxcold commented 3 days ago

@opauloh fyi I updated the ticket description and replaced findings in the naming proposal to misconfigurations as per update in the RFC to better match the terminology (we have Miconfiguration findings and Vulnerability findings and probably should avoid referring to Misconfiguration findings as just findings)

opauloh commented 8 hours ago

Update: I worked on a POC to check and demonstrate what the required steps would be to have the latest transform created in the integrations repository.

From what I saw the mappings and fields for the transform can be defined separately in the yml files in the transform/misconfigurations/fields folder without having to update the datastream fields. From what I saw it looks like we can avoid the breaking change with the data_stream.namespace field.

Also it looks like we already have the data_stream.namespace field as keyword in the logs-cloud_security_posture.findings-* data_view (however that change is done in Kibana on the plugin initialization).

We also have the dastream index as constant_keyword but it doesn't seem to affect any functionality as I could insert findings with different namespace