elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.4k stars 24.56k forks source link

More details on exception when having a `nested` field with `date_range` and `include_in_root` set #89164

Open ibotello opened 2 years ago

ibotello commented 2 years ago

Elasticsearch Version

7.14.2

Installed Plugins

No response

Java Version

bundled

OS Version

ESS

Problem Description

We get an exception when we have multiple nested documents all trying to add a binary doc values field to the root doc. This exception appears since Lucene only lets you add a single instance of a binary doc values field to a document. Would be nice to have a more detailed exception message to actually understand what is happening.

Steps to Reproduce

Creating a sample index with the nested type, and having a date_range field:

PUT my-index-00001
{
  "mappings": {
    "properties": {
      "NestedField": {
        "type": "nested",
        "include_in_root": true,
        "properties": {
          "DateField": {
            "type": "date_range"
          }
        }
      }
    }
  }
}

Trying to ingest one document:

PUT my-index-00001/_doc/1
{
  "NestedField": [
    {
      "DateField": {
        "gte": "2015-10-31",
        "lte": "2015-10-31"
      }
    },
    {
      "DateField": {
        "gte": "2016-10-31",
        "lte": "2016-10-31"
      }
    }
  ]
}

We get the next exception:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "DocValuesField \"NestedField.DateField\" appears more than once in this document (only one value is allowed per field)"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "DocValuesField \"NestedField.DateField\" appears more than once in this document (only one value is allowed per field)"
  },
  "status" : 400
}

Logs (if relevant)

No response

josefschiefer27 commented 2 years ago

I am getting running into the same issue using other data types (in my case with wildcard - interestingly keyword works - see below)

For instance, if I create an index with

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "user": {
        "type": "nested",
        "include_in_root": true,
        "properties": {
          "first": {
            "type": "keyword"
          },
          "last": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

With this index the following ingestion will work:

PUT my-index-000001/_doc/1
{
  "user": [
      { "first" : "Bob", "last" : "Kesler" },
      { "first" : "Robert", "last" : "Maxim" }
    ]
}

However, if I change the data type to wildcard

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "user": {
        "type": "nested",
        "include_in_root": true,
        "properties": {
          "first": {
            "type": "wildcard"
          },
          "last": {
            "type": "wildcard"
          }
        }
      }
    }
  }
}

... the same ingestion command will fail with the error "DocValuesField \"user.first\" appears more than once in this document (only one value is allowed per field)".

PUT my-index-000001/_doc/1
{
  "user": [
      { "first" : "Bob", "last" : "Kesler" },
      { "first" : "Robert", "last" : "Maxim" }
    ]
}

The interesting thing is that the following command works:

PUT my-index-000001/_doc/1
{
  "user": [
      { "first" : "Bob", "last" : "Kesler" }
    ]
}

So it looks like there is only an issue when there is an array of nested objects using include_in_root or include_in_parent.

josefschiefer27 commented 1 year ago

Any updates about this issue? It would be nice to understand the root cause of this issue. The described behavior is a bug that only surfaces with certain data types when using nested with include_in_root or include_in_parent.

patrick-radius commented 1 year ago

I'm having this issue too. it seems both include_in_root and include_in_parent behave the same.

lmignon commented 1 year ago

Any update about this issue? I'm facing the same one and it would be nice to know why and how we can fix it...

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

romseygeek commented 1 year ago

This is the same issue as #70261. Field types that store information in a lucene binary doc values field don't work with include_in_root or include_in_parent because each child will try and store its information on the parent document separately, and lucene only allows a single binary doc values field instance per doc.

We have a couple of options I think:

josefschiefer27 commented 1 year ago

Would lucene issue #11702 fix this problem?

josefschiefer27 commented 1 year ago

@romseygeek any plans to fix this issue? As it stands right now this is broken functionality for include_in_root and include_in_parent for certain data types.

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)