elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
1k stars 413 forks source link

LogsDB: Add synthetic_source_keep = none to arrays where order/duplicates do not matter #2376

Open andrewkroh opened 2 weeks ago

andrewkroh commented 2 weeks ago

For array fields treated as unordered sets, we should add synthetic_source_keep: "none" to the mappings to optimize storage under LogsDB. Fields like host.ip and related.ip would be candidates because order and duplicates are irrelevant.

Adding this option prevents the array field from being stored in _source.

Support for this is in-progress in Elasticsearch and will be first available in 8.16.

References

Related

andrewkroh commented 1 day ago

A first step that can be taken here is to add support into the ECS repo to allow expressing which fields are unordered sets. This can be done before Elasticsearch has the synthetic_source_keep: "none" mapping parameter. Once Elasticsearch has it then we can update the generators to output Elasticsearch mappings containing the parameter.

I would like to begin the process of annotating the fields that can receive this optimization, but we need support in the schema/*.yml files first.