elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.39k stars 24.56k forks source link

Regression: Requesting source no longer returns empty array when explicitly requesting #69391

Open simlu opened 3 years ago

simlu commented 3 years ago

This ticket is created as a follow-up to https://github.com/elastic/elasticsearch/issues/23796

When requesting fields from an empty array, we expect to see an empty array returned.

You can see in the example below that the nested "children" field is not returned when empty and nested fields are requested. However when the children object is explicitly requested, all containing fields are returned from it.


specs **Elasticsearch version** (`bin/elasticsearch --version`): 7.9.0 **Plugins installed**: [] **JVM version** (`java -version`): using the official docker image **OS version** (`uname -a` if on a Unix-like system): using the official docker image

Steps to reproduce:

Create the following files

index.json ```json { "mappings": { "dynamic": "false", "properties": { "id": { "type": "keyword" }, "children": { "properties": { "id": { "type": "keyword" }, "name": { "type": "text" } }, "type": "nested" } } } } ```
data.xjson ```json { "index": { "_index": "entity", "_id": "1" } } { "id": "1", "children": [{ "id": "2", "name": "A" }, { "id": "3", "name": "B" }] } { "index": { "_index": "entity", "_id": "3" } } { "id": "1", "children": [] } ```
query1.json ```json { "_source": [ "id", "children.id" ], "size": 20, "from": 0 } ```
query2.json ```json { "_source": [ "id", "children", "children.id" ], "size": 20, "from": 0 } ```

Then start your docker container with docker run -p 9201:9200 -e "discovery.type=single-node" -d docker.elastic.co/elasticsearch/elasticsearch:7.9.0

and run

printf 'Deleting index...\n'
curl localhost:9201/entity -X DELETE >> /dev/null
printf '\n\n'

printf 'Creating index...\n'
curl localhost:9201/entity -X PUT -d @index.json -H "Content-Type: application/json"
printf '\n\n'

printf 'Creating data...\n'
curl localhost:9201/_bulk -X POST --data-binary @data.xjson -H "Content-Type: application/x-ndjson"
printf '\n\n'

printf 'Refreshing index...\n'
curl localhost:9201/entity/_refresh
printf '\n\n'

printf 'Querying data...\n'
curl localhost:9201/entity/_search?pretty -d @query1.json -H "Content-Type: application/json"
printf '\n\n'

printf 'Querying data...\n'
curl localhost:9201/entity/_search?pretty -d @query2.json -H "Content-Type: application/json"
printf '\n\n'

Logs:

logs ```txt Deleting index... Creating index... {"acknowledged":true,"shards_acknowledged":true,"index":"entity"} Creating data... {"took":26,"errors":false,"items":[{"index":{"_index":"entity","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"entity","_type":"_doc","_id":"3","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}}]} Refreshing index... {"_shards":{"total":2,"successful":1,"failed":0}} Querying data... { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "entity", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "children" : [ { "id" : "2" }, { "id" : "3" } ], "id" : "1" } }, { "_index" : "entity", "_type" : "_doc", "_id" : "3", "_score" : 1.0, "_source" : { "id" : "1" } } ] } } Querying data... { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "entity", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "children" : [ { "name" : "A", "id" : "2" }, { "name" : "B", "id" : "3" } ], "id" : "1" } }, { "_index" : "entity", "_type" : "_doc", "_id" : "3", "_score" : 1.0, "_source" : { "children" : [ ], "id" : "1" } } ] } } ```
simlu commented 3 years ago

This gets more interesting when nested documents are multiple levels deep. I.e. for

{
  "id": "1",
  "children": [
    {
      "id": "2",
      "name": "A"
    },
    {
      "id": "3",
      "name": "B",
      "grandchildren": [
        {
          "id": "4",
          "name": "C"
        }
      ]
    }
  ]
}

and query ['id', 'children.grandchildren.id'] we get back

{
  "children": [
    {
      "grandchildren": [
        {
          "id": "4"
        }
      ]
    }
  ],
  "id": "1"
}

However we would expect to get back two documents under children where the missing nested document has an empty grandchildren field.

elasticmachine commented 3 years ago

Pinging @elastic/es-search (Team:Search)

simlu commented 3 years ago

Any input on this? @elastic/es-search

simlu commented 3 years ago

@javanna Asking because you've done the fix for the original issue here. What would be required to do the analog fix for this "reverse" regression here? Cheers!

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)