elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.14k stars 24.84k forks source link

Expose information about multi-fields in _field_caps #75474

Open timroes opened 3 years ago

timroes commented 3 years ago

In Kibana we need the information whether a field is an object field or child of an object field, a nested (child) or a multi-fields, since they are handled slightly different (partially due to query concernc, e.g. for nested field, partially for displaying purposes, e.g. multi-fields).

To calculate this, we're looking for every field we're finding if it contains a dot and if so, if there's any "prefix" (i.e. part before the dot) that is an object or nested field. If so we're considering it to be a child of a nested/object field and - for nested fields - store the name of that parent field into the field information we hold in the index pattern. If we find a prefix match, but it's neither an object or a nested type, we're assuming this field is the child of a multi-field and mark it as such and store it's parent. (See field_caps_response.ts)

It would be more stable if we could determine that information from the _field_caps API instead, instead of needing to "guess" it. I belive with runtime fields there might now even be scenarios where we "wrongly" detect the type, if a user would overwrite multi-field, nested or object childs in some specific ways.

I'd ask, if we can add have _field_caps respond like the following (wording of the parameter of course work-in-progress :wink:) for a multi-field. This would allow us to build a way more resilient implementation in Kibana for that.

{
  "message" : {
    "keyword" : {
      "type" : "keyword",
      "metadata_field" : false,
      "searchable" : true,
      "aggregatable" : true
    }
  },
  "message.text" : {
    "text" : {
      "type" : "text",
      "metadata_field" : false,
      "searchable" : true,
      "aggregatable" : false,
      "multi_field_parent": "message"
    }
  }
}

cc @mattkime

elasticmachine commented 3 years ago

Pinging @elastic/es-search (Team:Search)

romseygeek commented 3 years ago

It looks as though this would also be useful for the @elastic/es-ql and @elastic/ml-core teams as well, both of which have Java code within elasticsearch that takes field caps and attempts to reverse engineer parent relationships.

We can add two new optional fields to the output, parent and parent-type, the latter of which could be nested, multifield, flattened, composite, etc. We currently output plain object fields as part of the field caps response as well, but I'd be interested to hear if these are actually used anywhere. Object mappers in elasticsearch are really more about namespacing and I don't think they have any strictly functional use at the moment.

timroes commented 3 years ago

I would totally appreciate if we can get the parent information also for other field types, since it will also allow us to ease the logic for those field types, that we currently calculate ourselves.

We currently output plain object fields as part of the field caps response as well, but I'd be interested to hear if these are actually used anywhere.

We actually "use" them right now in the sense that we need them for calculating the "parent path", but then "throw them away" as a field in Kibana. Meaning if we'd have the general availability of parent and parent-type and could leverage that, we'd no longer need the object field itself in Kibana anymore.

Another advantage of having that information for all types not just multi-fields might be that we can have ES taking care of more edge-case scenarios, e.g. when "shadowing" a parent field with a runtime field of a different type, which might lead to weird results, it would be better if ES knows what to do, (e.g. let's assume we have a name and name.keyword field and now also have a runtime field with name overwriting this. In this case name.keyword is not really the multi field of name anymore?)

romseygeek commented 3 years ago

We actually "use" them right now in the sense that we need them for calculating the "parent path", but then "throw them away" as a field in Kibana. Meaning if we'd have the general availability of parent and parent-type and could leverage that, we'd no longer need the object field itself in Kibana anymore.

OK, good to know. I think we'd want to continue to return the parent field itself even if it hadn't been requested, particularly as this would allow us to deal with grandparents and great-grandparents, etc. So for example, if we have a doubly-nested field that contains a multifield, we'd return both nested parents and the parent field as well as the field requested for. Something like this:

GET index/_field_caps?fields=grandparent.parent.field.with.dot.keyword
{
  "fields" : {
    "grandparent" : {
      "nested" : { 
        "type" : "nested",
        ...
      }
    },
    "grandparent.parent" : {
      "nested" : { 
        "type" : "nested",
        "parent" : "grandparent",
        "parent-type" : "nested",
        ...
      }
    },
    "grandparent.parent.field.with.dot" : {
      "text" : { 
        "type" : "text",
        "parent" : "grandparent.parent",
        "parent-type" : "nested",
        ...
      }
    },
   "grandparent.parent.field.with.dot.keyword" : {
      "keyword" : { 
        "type" : "keyword",
        "parent" : "grandparent.parent.field.with.dot",
        "parent-type" : "multifield",
        ...
      }
    },
  }
}

Possibly parent-type is the wrong name, as it's not the type of the parent, it's the type of the relationship. So it could perhaps instead be a parent object with field and relationship fields.

romseygeek commented 3 years ago

We will also have to consider BWC carefully here, as AIUI field caps are available through CCS, so if we change the response format we will need to handle responses from remote clusters running older versions of ES

mattkime commented 3 years ago

Just a note that @timroes found a case where we're unable to tell the difference between an object field and a multi-field due to insufficient information and usually-but-not-always correct assumptions.

mattkime commented 3 years ago

@romseygeek I'm planning for 7.16 and I'm curious if this issue might move forward soon.

Related question - do we have a sufficient problem statement and proposed solution?

romseygeek commented 3 years ago

I'm sure this will get in for 7.16. I plan to start work on it properly this week. I think we have a reasonably well-defined issue and solution here?

romseygeek commented 3 years ago

One open question: what should we do different indexes have different field structures? For example, if the field foo.text is mapped as a multifield of foo in one index, but as a plain object field in another? We've sort of punted on this for metadata fields by setting it to true if it's true for any index but it's not at all obvious what to do for parent info.

timroes commented 3 years ago

I've thought a bit about that. I think Discover is currently the main consumer in Kibana using the multifield logic, to group multi fields under their corresponding parent field:

screenshot-20210913-112232

And also by default thus not show the value of it in the table/document, since it's assumed to be mostly the same value as the parent field.

I think we generate no matter what we decide doing for index patterns where there are some indexes having it as a multi field and some not, some false positives with either solution. Either we'll hide sometimes values that might be from a different field, or otherwise sometimes overly eager show values that are actual multi fields.

I'd currently tend more towards having that information nevertheless to the parent, and thus potentially running into cases were we hide fields from some indexes that were actually not multi-fields. Though this is more a feeling of gut, and if someone has good rational reasons for either the one or other solution, glad to hear them.

Update: This would though mean that the original issue I filed in Kibana https://github.com/elastic/kibana/issues/105238 would still persist :D Since we'd still detect the same scenario as multi field in this case... :thinking:

@mattkime @kertal Happy for your thoughts around this.

romseygeek commented 2 years ago

This got lost by the wayside a bit, but I'd like to propose an alternative solution that I think would deal with differing structures across indexes more cleanly. Instead of including parent information, we could instead add some filtering options to field caps calls. So you could add a filter=-multifields parameter to exclude any fields that are defined as multifields from the response. Similarly we have internal calls (eg from DataFrames or SQL) that want to exclude nested fields, so we could add filter=-nested, and the time series service that wants to collect dimensions could add filter=+dimension. Would this work for Kibana? I don't know how often you call field caps and if you cache responses across different calls from different plugins.

romseygeek commented 2 years ago

Resolved by #83636

mattkime commented 1 year ago

I don't think filtering out multifields is the same as communicating multifield information.

That said, this isn't currently a priority. Perhaps we'll circle back to it in the future.

sophiec20 commented 1 year ago

There is a trend for index patterns to encompass many indices e.g. logs-*

We had a hard-to-diagnose issue were a "single" index out of many had log as a field, whereas the rest had log as an object. This was a mappings conflict over CCS (with ~60 remotes).

Diagnosis was very hard, because we had to get high privilege access to remote clusters to manually check mappings of many many indices.

Being able to understand field types from field_caps remains a requirement, and for this to be supported over CCS. This allows the application UIs to handle display of information better, and allows for warning messages (e.g. mappings conflict) to be meaningful and useful.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)