Open EmilBode opened 5 years ago
Update I've been trying some more things, and I've found some even worse behaviour. When there are multiple nested fields, and when querying on both, some of the names of the query-part on A end up in the inner hits of field B.
I'm sorry this will be quite a lot of code, but I couldn't get it much more minimal while still showing what I mean.
The problem is right at the bottom, where you see the queries no field4
and no field5
mentioned in the inner hits from the root
-field.
Note the difference between the 2 parts: the named queries from otherroot
do leak over to the part from root
, but not the other way around. I'm asuming order may have something to do with that, but I haven't tested that hypothesis
New mapping
PUT newindex
{
"mappings": {
"properties": {
"root": {
"type": "nested",
"properties": {
"foo": {
"type": "keyword"
},
"bar": {
"type": "keyword"
}
}
},
"baz": {
"type": "keyword"
},
"otherroot": {
"type": "nested",
"properties": {
"field4": {
"type": "keyword"
},
"field5": {
"type": "keyword"
}
}
}
}
}
}
New document
PUT newindex/_doc/1
{
"root": [
{
"foo": "gdhjkl",
"bar": "gdhjkl2"
},
{
"foo": "not_filled"
}
],
"baz": "someval",
"otherroot": [
{
"field4": "fwuvesd",
"field5": "gbsduil"
},
{
"field4": "dnbfjskl"
}
]
}
Query
GET newindex/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "root",
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "root.foo"
}
}
],
"_name": "no foo"
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "root.bar"
}
}
],
"_name": "no bar"
}
},
{
"match": {
"root.foo": {
"query": "not_filled",
"_name": "foo has wrong value"
}
}
}
]
}
},
"inner_hits": {}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
],
"_name": "no baz"
}
},
{
"nested": {
"path": "otherroot",
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "otherroot.field4"
}
}
],
"_name": "no field4"
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "otherroot.field5"
}
}
],
"_name": "no field5"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Output
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"root" : [
{
"foo" : "gdhjkl",
"bar" : "gdhjkl2"
},
{
"foo" : "not_filled"
}
],
"baz" : "someval",
"otherroot" : [
{
"field4" : "fwuvesd",
"field5" : "gbsduil"
},
{
"field4" : "dnbfjskl"
}
]
},
"matched_queries" : [
"no bar",
"no foo",
"no field5",
"no field4"
],
"inner_hits" : {
"otherroot" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "otherroot",
"offset" : 1
},
"_score" : 0.0,
"_source" : {
"field4" : "dnbfjskl"
},
"matched_queries" : [
"no field5"
]
}
]
}
},
"root" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "root",
"offset" : 1
},
"_score" : 0.6931472,
"_source" : {
"foo" : "not_filled"
},
"matched_queries" : [
"foo has wrong value",
"no bar",
"no field5",
"no field4"
]
}
]
}
}
}
}
]
}
}
Pinging @elastic/es-search
Hey, my team just encountered this exact bug. We would love some input on this from the elastic team.
Thank you
Pinging @elastic/es-search (Team:Search)
Pinging @elastic/es-search-relevance (Team:Search Relevance)
I noticed then when using a named query within a nested query, the top-level field
matched_fields
contains the wrong names. It looks like something is evaluated in the top-level context instead of within the nested context.Reprex (with 2 different examples within):
Suppose I have the following index, with this document:
And I want to write a query that returns "incomplete" docs, i.e. those with not all values present for foo and bar, or where foo contains the value
not_filled
:Actual output:
Expected output In the top-level
matched_queries
-field, I'd expect either nothing (as the queries only hit nested documents, nothing on the top-level), or["no bar", "foo has wrong value"]
(order is not really relevant) In this case, the name no foo is returned, even though this query did not give me a hit. But at the same time, the name foo has wrong value is not returned, even though it did cause a hit. Note that the names in theinner_hits
-section are as expectedOn the top-level field
matched_queries
, I'd either:[]
. Any matched queries from a nested query can then be shown within theinner_hits
-section (as they are now as well)matched_queries
-field in theinner_hits
-section.My impression is that the query itself is executed correctly, but after that, the named queries are evaluated seperately, only looking at the document as a whole (where the fields
root.foo
androot.bar
are missing, as they are not really part of the bare document, and on the other hand thematch
-query onfoo
fails)Fix For option 1-output, we could simply ignore any named queries within
nested
-queries when computing the top-levelmatched_queries
-field. Option 2-output may be a bit more complicated, but we could compute the option1-set of names, then later combine then with the names from theinner_hits
(eventually removing duplicates)System details: Elasticsearch version 7.3.1 (and also seen on 6.8.2) JVM version (
java -version
): 1.8.0_221 OS version: Windows 10 (64-bit)