elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.45k stars 24.88k forks source link

Allow composite runtime fields to add top level fields #87690

Open ruflin opened 2 years ago

ruflin commented 2 years ago

Composite runtime fields are especially useful in the context of grok / dissect to extract multiple fields at once. AFAIK there is currently the limitation that all these runtime fields need to have the same prefix which has the issue, these fields can not be mapped to ECS properly. Below is a simplified example to demonstrate the problem.

DELETE _data_stream/logs-example-default
PUT _data_stream/logs-example-default

PUT logs-example-default/_mappings
{
  "runtime": {
    "example": {
      "type": "composite",
      "script": """
        Map fields=dissect('%{source.ip} [%{timestamp}] %{http.request.method}').extract(params["_source"]["message"]);
        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("dd/MMM/yyyy:H:m:s Z");
        ZonedDateTime zdt = ZonedDateTime.parse(fields["timestamp"],dtf);
        long datetime = zdt.toInstant().toEpochMilli();

        fields["timestamp"] = datetime;
        emit(fields);

      """,
      "fields": {
        "timestamp": {
          "type": "date"
        },
        "source.ip": {
          "type": "ip"
        },
        "http.request.method": {
          "type": "keyword"
        }
      }
    }
  }
}

POST logs-example-default/_doc/
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "67.43.156.13 [07/Dec/2016:10:34:43 +0100] GET"
}

GET logs-example-default/_search
{
  "fields" : ["*"]
}

The data should be in the source.ip field but because of the limitation it is in example.source.ip. I tried to have an alias from source.ip to example.source.ip to at least get the query to work but I would also argue this is not a great solution as it would prevent from having documents with actual data in the source.ip field itself.

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

nik9000 commented 2 years ago

You should be able to use a field alias to "float" it out. But I've marked this as discuss so folks can talk and see what they'd like to do.

javanna commented 2 years ago

heya @ruflin could you expand on why having a common prefix is a limitation and why you'd need to send these fields to the top-level? Do they replace some existing fields?

Wouldn't the field alias/additional runtime field solution be equivalent to a built-in solution when it comes to preventing a field with same name from holding data? This may stem from the fact that field aliases are defined under properties, so maybe I would try out a runtime field which does not prevent you from indexing data into a field with same name, although that would be shadowed and not accessible at search time.

ruflin commented 2 years ago

When we ingest data, we try to map it to ECS The above example is a simplified example of nginx logs. Currently we do it all via ingest pipelines but in many cases, we don't have to index all the data but would like to it with runtime fields instead. The expected outcome is if we extract some fields, these should still be in ECS, for example source.ip and http.request.method.

I couldn't fully follow your second comment. But my ideal outcome would be, that I could have documents that have source.ip as an indexed field inside and other documents on the same data stream where it is a runtime field. But that is an additional goal after I can do the correct mapping to ECS fields. Even better would be if I could convert my composite runtime field to an index runtime field like I can do for other runtime fields.

javanna commented 2 years ago

But my ideal outcome would be, that I could have documents that have source.ip as an indexed field inside and other documents on the same data stream where it is a runtime field.

I see, but then each index would either have the field as a runtime field or as an indexed field? This reminds of the discussion happening in #86536 . The way we envisioned these changes so far is at the next rollover, hence you would not have an index with a mixed approach. In that case a field alias should work? What I was hinting at with the second part of my comment is that field aliases could be re-implemented as runtime fields. Effectively you can already implement a field alias through a runtime field but you need to define a script for it which is not fantastic for the user experience. if a field alias is defined under runtime, an indexed field with same name can still be mapped under properties, although shadowed. Though I was questioning whether this is a concern at all, assuming that each index should have only one variant of the field in question.

Converting a composite runtime field to indexed is on the roadmap, see #77625 .

ruflin commented 2 years ago

Great to see composite runtime fields on the roadmap and https://github.com/elastic/elasticsearch/issues/86536 is interesting indeed.

Taking all the above, going back to the initial question and putting aside the discussion around if runtime or mapped field is default on query time, I would still like to be able to set source.ip directly in the composite runtime field. Does my explanation around ECS help on why this is needed?

javanna commented 2 years ago

I had a chat with @ruflin and I have now a better understanding of the problem. Field aliases can only point to indexed fields, mapped under properties, and not to runtime fields. The current workaround is to create a runtime field with a script that emits the value of example.source.ip. One follow-up could be that field aliases should really be implemented as runtime fields (see #87969). Even better, one may wonder why there is a need to declare a second field to expose the grok sub-field to the top-level. This last point we have discussed quite a bit when we were designing the composite runtime field, but it does not hurt to look back and discuss it again.

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)