elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.69k stars 24.66k forks source link

Add index-time composite fields #77625

Open romseygeek opened 3 years ago

romseygeek commented 3 years ago

We added runtime composite fields in #75108, which allow users to specify a runtime script that emits multiple values, each of which may be referred to separately as their own typed subfield. We should also add this functionality at index time.

An index-time composite field can be defined like this:

"properties" : {
      "log" : {
        "type" : "composite",
        "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))",
        "fields" : {
            "clientip" : {
                "type" : "ip"
            },
            "response" : {
                "type" : "long"
            }
        }
      }
    }

This will generate index-time fields called 'log.clientip' and 'log.response', with the appropriate types. The values of those fields will be generated by running the top-level script as part of the general index-time scripting step. It should be an error to index a document containing log, log.clientip or log.response fields.

elasticmachine commented 3 years ago

Pinging @elastic/es-search (Team:Search)

ruflin commented 2 years ago

We are currently doing some efforts to better measure performance impacts of ingest pipelines especially around grok. Quite a few of these ingest pipelines could be converted to composite runtime fields eventually. The part I wonder is how with indexed composite runtime fields the performance can be measured? For ingest pipelines, we currently use the stats API.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)