Open andrewkroh opened 2 weeks ago
A first step that can be taken here is to add support into the ECS repo to allow expressing which fields are unordered sets. This can be done before Elasticsearch has the synthetic_source_keep: "none"
mapping parameter. Once Elasticsearch has it then we can update the generators to output Elasticsearch mappings containing the parameter.
I would like to begin the process of annotating the fields that can receive this optimization, but we need support in the schema/*.yml
files first.
For array fields treated as unordered sets, we should add
synthetic_source_keep: "none"
to the mappings to optimize storage under LogsDB. Fields likehost.ip
andrelated.ip
would be candidates because order and duplicates are irrelevant.Adding this option prevents the array field from being stored in
_source
.Support for this is in-progress in Elasticsearch and will be first available in 8.16.
References
Related
2372 (no longer relevant as we switched to an opt-in model for array optimization in logsdb)