elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
977 stars 24.82k forks source link

ESQL: Allow reading fields from source dynamically #115092

Open flash1293 opened 2 weeks ago

flash1293 commented 2 weeks ago

Description

Currently, it's not possible to directly read fields which are only available in source, but not mapped:

PUT my-restricted-index
{
  "mappings": {
    "dynamic": false,
    "properties": {
      "a": {
        "type": "keyword"
      }
    }
  }
}

POST my-restricted-index/_doc
{
  "a":  "fsfd",
  "b": "xyz"
}

// fails with "Unknown column"
POST _query
{
  "query": "FROM my-restricted-index | WHERE b == \"xyz\""
}

However, it's possible to make this work using runtime fields:


PUT my-restricted-index/_mapping
{
    "runtime": {
      "b": {
        "type": "keyword"
      }
    }
}

// works now
POST _query
{
  "query": "FROM my-restricted-index | WHERE b == \"xyz\""
}

To allow users to both control their storage cost by not mapping and indexing rarely used fields and still being able to comfortably querying them via ESQL, there should be a way to instruct ESQL to read from source without having to deal with runtime fields. A possible syntax could look like this:

// works even without runtime field mapping
POST _query
{
  "query": "FROM my-restricted-index | EVAL b = FROM_SOURCE(\"b\", \"keyword\") | WHERE b == \"xyz\""
}

Ideally, there is a way for FROM_SOURCE to transparently leverage indexed fields if they exist - in this case the example above would check whether the field b is mapped - if not, it will retrieve it from source like the runtime field. If it's mapped as a keyword already, FROM_SOURCE becomes a no-op.

elasticsearchmachine commented 2 weeks ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 commented 2 weeks ago

Ideally, there is a way for FROM_SOURCE to transparently leverage indexed fields if they exist - in this case the example above would check whether the field b is mapped - if not, it will retrieve it from source like the runtime field. If it's mapped as a keyword already, FROM_SOURCE becomes a no-op.

That sounds pretty possible.