Open markharwood opened 2 years ago
Pinging @elastic/es-search (Team:Search)
Some thoughts on this proposal:
host.ip
in the case when a host has multiple network interfaces.agent.id
or host.name
. Likewise, for field mappings created through dynamic mappings, users are unlikely to be able to set the single-value flag since they don't know about these fields. I believe we're more likely to be successful with the reporting approach than the enforcing approach._source
. Presumably, always assuming that runtime fields may be multi-valued would defeat a lot of the value we're expecting from this proposal, but we don't have a good way to detect whether runtime fields are single-valued in general. Should we have a flag on them that allows to declare whether they're single or multi-valued, and maybe assume single-valued by default? (Not a fan of this suggestion, mostly adding it to get the discussion started.)Pinging @elastic/es-search-foundations (Team:Search Foundations)
Background
For a long time elasticsearch has been very permissive about JSON documents and has made no distinction between single values and arrays of values. This permissive approach has several downsides: 1) Client code and scripts are made more complex. To be robust, code must be written to handle both single-valued fields and arrays of fields. 2) Kibana does some strange things. e.g. Kibana will happily try "AND" multiple values from a bar chart/pie chart which never makes sense for values taken from a single-valued field. This produces no matches because no document can be
OS:ios
andOS:android
simultaneously 3) Administrators cannot easily "lock down" the mapping. Custom ingest scripts are required to prevent multi-valued documents being added (and ingest scripts can still be circumvented by clients sending documents?).All of the above is unfortunate because the majority of fields in common use are single-valued. A weblog's fields are a good example (timestamp, IP, OS, user agent, URL, referrer, country etc are all single values).
Proposed changes
The solution is a 2-pronged approach : Enforcement: for new indices we can give administrators the option of rejecting documents with multiple-values. Reporting: for both new and old indices we can report if the index contains only documents with single values
is_single_valued
flag to field caps output which indicates if all documents have single values for a field https://github.com/elastic/elasticsearch/pull/80730boolean allowsMultipleValues()
method to FieldMapper and remove existing validation code in single-valued fields that is slow. The DocumentParser class should instead assume responsibility for checking single-valued fields don't receive multiple valuesallow_multiple_values
flag to field mappings that can reject documents presenting arrays https://github.com/elastic/elasticsearch/pull/80289allow_multiple_values
field mapping is set and we know this is enforced at ingest timeallow_multiple_values
is set to false (using NumericDocValuesField instead of SortedNumericDocValuesField and SortedDocValuesField instead of SortedSetDocValuesField)is_single_valued
feedback in field-caps (e.g. not ANDing values from this field in filter pills). Mention of related progress here