Open jsuchal opened 9 years ago
+1 we should add this! I think we should also open an issue for this in Lucene, because the nested
query uses the ToParentBlockJoinQuery
Lucene query to do the actual work.
I opened: https://issues.apache.org/jira/browse/LUCENE-6354 to get this in Lucene
+1
+1
+1
+1
+1
+1
+1
This would be a great feature to have. The only reason why we are using parent/child instead of nested mapping is the lack of min_children/max_children options in the nested query. Considering that:
I would very much like to see this implemented. Please let me know if there's anything I can do to help.
Any update on this? Would be really, really good to have this feature!
@elastic/es-search-aggs
Stalled waiting for https://issues.apache.org/jira/browse/LUCENE-6354 to be completed and merged
+1
+1
I'd love to see this.
To give some context - while the main reason for us to migrate to a parent/child model from nested was indexing speed we also did so because of the min_children
and max_children
feature.
However, we have become painfully aware of the cost of has_child
queries (joins) as the number of child documents and/or complexity of queries increases. OOM exceptions have become too frequent for comfort.
For stability reasons, we are re-considering the nested model even if it means decreased indexing speed. Knowing, that min_children
and max_children
for that model are still being planned would re-assure us.
Thank you!
Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.
To support min_children
and max_children
for your nested query, all you have to do is to use a function_score
query. To make this more concrete: you have an index called Person
with the following mapping:
first_name
(text)email
(text)children
(nested)
first_name
(text)last_name
(text)To effectively support min_children
and max_children
, there are multiple queries you need to consider:
Depending on the scenario, the request will look different.
function_score
:function_score
supports min_score. It filters out any document where the score is lower than the min_score
.function_score
has a max_boost
. This doesn't filter documents returned, it simply caps the score to a specific value. For instance: if after calculating the score, you end up with 500
, and the max_boost is 50
, 50
will be returned.function_score
to pollute the overall score of the document, apply a boost of 0.Explanation: easiest query, you simply have to verify there are no nested documents. It is significantly faster than using the function_score.
{
"query": {
"bool": {
"must_not": {
"nested": {
"path": "children",
"query": {
"exists": {
"field": "children.first_name"
}
}
}
}
}
}
}
Explanation: each matching document is boosted by 10 and the nested
query sums them. The function_score
filters out any document that is less than what is expected.
Example: Find all persons who have a minimum of 2 children: the boost applied here is 10 (you can set any number you want here), as such the min_score is 20 (2 * 10).
Before Elastic 7:
{
"query": {
"function_score": {
"min_score": 20,
"boost": 1,
"query": {
"nested": {
"path": "children",
"query": {
"exists": {
"field": "children.first_name"
}
},
"boost": 10,
"score_mode": "sum"
}
}
}
}
}
Elastic 7+:
{
"query": {
"function_score": {
"min_score": 20,
"boost": 1,
"score_mode": "multiply",
"boost_mode": "replace",
"query": {
"nested": {
"path": "children",
"boost": 10,
"score_mode": "sum",
"query": {
"constant_score": {
"boost": 1,
"filter": {
"exists": {
"field": "children.first_name"
}
}
}
}
}
}
}
}
}
Explanation: same as above, except here we apply a script to alter the score. This script checks that the sum of all boosts is not exceeding m * boost, if it does, it returns 0 which automatically guarantee the document will be excluded (0 < min_score
).
Example: Find all persons who have a minimum of 2 children and a maximum of 5 children (inclusive).
Before Elastic 7:
{
"query": {
"function_score": {
"min_score": 20,
"boost": 1,
"functions": {
"script_score": {
"script": {
"source": "if (_score > 50) { return 0; } return _score;",
"lang": "painless"
}
}
},
"query": {
"nested": {
"path": "children",
"query": {
"exists": {
"field": "children.first_name"
}
},
"boost": 10,
"score_mode": "sum"
}
}
}
}
}
Elastic 7+:
{
"query": {
"function_score": {
"min_score": 20,
"boost": 1,
"score_mode": "multiply",
"boost_mode": "replace",
"functions": [
{
"filter": {
"match_all": {
"boost": 1
}
},
"script_score": {
"filter": {
"match_all": {
"boost": 1
}
},
"script": {
"source": "if (_score > 50) { return 0; } return _score;",
"lang": "painless"
}
}
}
],
"query": {
"nested": {
"path": "children",
"boost": 10,
"score_mode": "sum",
"query": {
"constant_score": {
"boost": 1,
"filter": {
"exists": {
"field": "children.first_name"
}
}
}
}
}
}
}
}
}
Explanation: this request means you are asking for persons who have no children and persons who have been 1 to n children. Expressing this with elastic can be tricky, so taking the negation makes it easier: you're asking to not find all persons who have more than n + 1 children.
Example: find all persons who have less than 2 children.
Before Elastic 7:
{
"query": {
"bool": {
"must_not": {
"function_score": {
"min_score": 30,
"boost": 1,
"query": {
"nested": {
"path": "children",
"query": {
"exists": {
"field": "children.first_name"
}
},
"boost": 10,
"score_mode": "sum"
}
}
}
}
}
}
}
Elastic 7+:
{
"query": {
"bool": {
"must_not": {
"function_score": {
"min_score": 30,
"boost": 1,
"score_mode": "multiply",
"boost_mode": "replace",
"query": {
"nested": {
"path": "children",
"boost": 10,
"score_mode": "sum",
"query": {
"constant_score": {
"boost": 1,
"filter": {
"exists": {
"field": "children.first_name"
}
}
}
}
}
}
}
}
}
}
}
Hope this helps.
Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.
Any update on this?
I'm trying to filter my results based on an exact length. @xethorn I can't seem to get your solution working with filters, could you point me in the right direction?
Here's my search with filters, which don't support scoring:
GET /test/_search
{
"query" : {
"function_score": {
"min_score": 20,
"boost": 1,
"functions": [
{
"script_score": {
"script": {
"source": "if (_score > 20) { return - 1; } return _score;"
}
}
}
],
"query": {
"bool" : {
"filter": [
{ "range": { "distance": { "lt": 5 }}},
{
"nested": {
"score_mode": "sum",
"boost": 10,
"path": "dates",
"query": {
"bool": {
"filter": [
{ "range": { "dates.rooms": { "gte": 1 } } },
{ "range": { "dates.timestamp": { "lte": 2 }}},
{ "range": { "dates.timestamp": { "gte": 1 }}}
]
}
}
}
}
]
}
}
}
}
}
A few more details here: https://stackoverflow.com/questions/63226805/filter-query-by-length-of-nested-objects-ie-min-child
Question was answered on slack overflow. For comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4. :)
Pinging @elastic/es-search (Team:Search)
Pinging @elastic/es-search-relevance (Team:Search Relevance)
I am opening this as a separate issue since the previous issue was closed with support for parent-child docs (https://github.com/elasticsearch/elasticsearch/issues/6019#issuecomment-77785163).
We would love to have support for min_children & max_children or similar also for nested filters/docs. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2
Thanks a keep up the great work.