elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.18k stars 24.85k forks source link

Named query within nested query: matched_queries field evaluated in wrong context #46231

Open EmilBode opened 5 years ago

EmilBode commented 5 years ago

I noticed then when using a named query within a nested query, the top-level field matched_fields contains the wrong names. It looks like something is evaluated in the top-level context instead of within the nested context.

Reprex (with 2 different examples within):

Suppose I have the following index, with this document:

PUT newindex
{
  "mappings": {
    "properties": {
      "root": {
        "type": "nested",
        "properties": {
          "foo": {
            "type": "keyword"
          },
          "bar": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

PUT newindex/_doc/1
{
  "root": [
    {
      "foo": "gdhjkl",
      "bar": "gdhjkl2"
    },
    {
      "foo": "not_filled"
    }
    ]
}

And I want to write a query that returns "incomplete" docs, i.e. those with not all values present for foo and bar, or where foo contains the value not_filled:

GET newindex/_search
{
  "query": {
    "nested": {
      "path": "root",
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "must_not": [
                  {
                    "exists": {
                      "field": "root.foo"
                    }
                  }
                ],
                "_name": "no foo"
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "exists": {
                      "field": "root.bar"
                    }
                  }
                ],
                "_name": "no bar"
              }
            },
            {
              "match": {
                "root.foo": {
                  "query": "not_filled",
                  "_name": "foo has wrong value"
                }
              }
            }
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

Actual output:

{
  ...
    "hits" : [
      {
        "_index" : "newindex",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "root" : [
            {
              "foo" : "gdhjkl",
              "bar" : "gdhjkl2"
            },
            {
              "foo" : "not_filled"
            }
          ]
        },
        "matched_queries" : [
          "no bar",
          "no foo"
        ],
        "inner_hits" : {
          "root" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.6931472,
              "hits" : [
                {
                  "_index" : "newindex",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "root",
                    "offset" : 1
                  },
                  "_score" : 0.6931472,
                  "_source" : {
                    "foo" : "not_filled"
                  },
                  "matched_queries" : [
                    "foo has wrong value",
                    "no bar"
                  ]
... (closing brackets/braces)...

Expected output In the top-level matched_queries-field, I'd expect either nothing (as the queries only hit nested documents, nothing on the top-level), or ["no bar", "foo has wrong value"] (order is not really relevant) In this case, the name no foo is returned, even though this query did not give me a hit. But at the same time, the name foo has wrong value is not returned, even though it did cause a hit. Note that the names in the inner_hits-section are as expected

On the top-level field matched_queries, I'd either:

  1. Expect the matched queries on the top level, in this case []. Any matched queries from a nested query can then be shown within the inner_hits-section (as they are now as well)
  2. (Better, but perhaps more complicated): The matched queries on the top-level plus the set of matched queries inherited from the nested query: in this case the same value as in the matched_queries-field in the inner_hits-section.

My impression is that the query itself is executed correctly, but after that, the named queries are evaluated seperately, only looking at the document as a whole (where the fields root.foo and root.bar are missing, as they are not really part of the bare document, and on the other hand the match-query on foo fails)

Fix For option 1-output, we could simply ignore any named queries within nested-queries when computing the top-level matched_queries-field. Option 2-output may be a bit more complicated, but we could compute the option1-set of names, then later combine then with the names from the inner_hits (eventually removing duplicates)

System details: Elasticsearch version 7.3.1 (and also seen on 6.8.2) JVM version (java -version): 1.8.0_221 OS version: Windows 10 (64-bit)

EmilBode commented 5 years ago

Update I've been trying some more things, and I've found some even worse behaviour. When there are multiple nested fields, and when querying on both, some of the names of the query-part on A end up in the inner hits of field B.

I'm sorry this will be quite a lot of code, but I couldn't get it much more minimal while still showing what I mean.

The problem is right at the bottom, where you see the queries no field4 and no field5 mentioned in the inner hits from the root-field. Note the difference between the 2 parts: the named queries from otherroot do leak over to the part from root, but not the other way around. I'm asuming order may have something to do with that, but I haven't tested that hypothesis

New mapping

PUT newindex
{
  "mappings": {
    "properties": {
      "root": {
        "type": "nested",
        "properties": {
          "foo": {
            "type": "keyword"
          },
          "bar": {
            "type": "keyword"
          }
        }
      },
      "baz": {
        "type": "keyword"
      },
      "otherroot": {
        "type": "nested",
        "properties": {
          "field4": {
            "type": "keyword"
          },
          "field5": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

New document

PUT newindex/_doc/1
{
  "root": [
    {
      "foo": "gdhjkl",
      "bar": "gdhjkl2"
    },
    {
      "foo": "not_filled"
    }
    ],
  "baz": "someval",
  "otherroot": [
    {
      "field4": "fwuvesd",
      "field5": "gbsduil"
    },
    {
      "field4": "dnbfjskl"
    }
    ]
}

Query

GET newindex/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "root",
            "query": {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "root.foo"
                          }
                        }
                      ],
                      "_name": "no foo"
                    }
                  },
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "root.bar"
                          }
                        }
                      ],
                      "_name": "no bar"
                    }
                  },
                  {
                    "match": {
                      "root.foo": {
                        "query": "not_filled",
                        "_name": "foo has wrong value"
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "exists": {
                  "field": "baz"
                }
              }
            ],
            "_name": "no baz"
          }
        },
        {
          "nested": {
            "path": "otherroot",
            "query": {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "otherroot.field4"
                          }
                        }
                      ],
                      "_name": "no field4"
                    }
                  },
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "otherroot.field5"
                          }
                        }
                      ],
                      "_name": "no field5"
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  }
}

Output

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "newindex",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "root" : [
            {
              "foo" : "gdhjkl",
              "bar" : "gdhjkl2"
            },
            {
              "foo" : "not_filled"
            }
          ],
          "baz" : "someval",
          "otherroot" : [
            {
              "field4" : "fwuvesd",
              "field5" : "gbsduil"
            },
            {
              "field4" : "dnbfjskl"
            }
          ]
        },
        "matched_queries" : [
          "no bar",
          "no foo",
          "no field5",
          "no field4"
        ],
        "inner_hits" : {
          "otherroot" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.0,
              "hits" : [
                {
                  "_index" : "newindex",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "otherroot",
                    "offset" : 1
                  },
                  "_score" : 0.0,
                  "_source" : {
                    "field4" : "dnbfjskl"
                  },
                  "matched_queries" : [
                    "no field5"
                  ]
                }
              ]
            }
          },
          "root" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.6931472,
              "hits" : [
                {
                  "_index" : "newindex",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "root",
                    "offset" : 1
                  },
                  "_score" : 0.6931472,
                  "_source" : {
                    "foo" : "not_filled"
                  },
                  "matched_queries" : [
                    "foo has wrong value",
                    "no bar",
                    "no field5",
                    "no field4"
                  ]
                }
              ]
            }
          }
        }
      }
    ]
  }
}
elasticmachine commented 5 years ago

Pinging @elastic/es-search

PeledYuval commented 4 years ago

Hey, my team just encountered this exact bug. We would love some input on this from the elastic team.

Thank you

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)