elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.74k stars 24.68k forks source link

Make doc values of original field accessible when shadowing it through a runtime field #89093

Open javanna opened 2 years ago

javanna commented 2 years ago

A runtime field can shadow an existing field with the same name, whether that be another runtime field or an indexed field. A runtime field may shadow for instance an indexed field that has wrongly computed/formatted values, and correct the mistake made at index time transparently for consumers, without needing to reindex or define another field with a different name.

When defining the script that computes/formats the values correctly, it is currently not possible to load the existing values from doc_values, as you would be referring to a field with same name from the script which is detected as a reference loop and forbidden. The current workaround is to always load the original field from _source, but that is more costly.

We would like to expose the original values within a runtime field scripts, either using some naming convention to refer to the original field, or by allowing what is now detected as a loop and make it load from doc_values if present. The fact that runtime fields in the search request override runtime fields with the same name defined in the index mappings needs to be taken into account as well: is it necessary that a shadowed runtime field defined in the index mappings is accessible, or can we limit ourselves to exposing the original values that are in the index (doc_values) and ignore the case where a runtime field shadows another runtime field?

Another scenario where accessing the original value would be useful is when a new field is added to an existing index, and users would like to compute the value for documents that don't have it indexed, while leveraging the faster indexed field when possible. A runtime field could have logic to pull the indexed value when available or compute it otherwise. Without this, within such index it would be impossible to leverage the indexed field.

elasticsearchmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

Mpdreamz commented 2 years ago

Would this still require execution of the script or writing the script in a certain way?

Would it be potentially easier (and faster?) for users to simply indicate this as an option? That way Elasticsearch will first attempt to load the field and only execute the runtime script if a value is missing.

"runtime_mappings": {
    "day_of_week": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
      },
      "shadow": false
    }
  },
javanna commented 2 years ago

We'll have to come up with some syntax that allows to access the field that the current field shadows. Running a script is not heavy by itself, it's what the script does that matters. In this case, I would worry about getting access to the indexed field, so that I don't load it from _source when I can, rather than trying to avoid running the script when a value is available. This way we lean on the flexibility of scripts, without requiring additional options outside of the script itself.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)