elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.73k stars 24.68k forks source link

Support min_children & max_children for nested docs #10043

Open jsuchal opened 9 years ago

jsuchal commented 9 years ago

I am opening this as a separate issue since the previous issue was closed with support for parent-child docs (https://github.com/elasticsearch/elasticsearch/issues/6019#issuecomment-77785163).

We would love to have support for min_children & max_children or similar also for nested filters/docs. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2

Thanks a keep up the great work.

martijnvg commented 9 years ago

+1 we should add this! I think we should also open an issue for this in Lucene, because the nested query uses the ToParentBlockJoinQuery Lucene query to do the actual work.

martijnvg commented 9 years ago

I opened: https://issues.apache.org/jira/browse/LUCENE-6354 to get this in Lucene

gmenegatti commented 9 years ago

+1

kmcs commented 9 years ago

+1

lonre commented 8 years ago

+1

guilherme-santos commented 7 years ago

+1

asuiu commented 7 years ago

+1

conan commented 7 years ago

+1

voran commented 7 years ago

+1

voran commented 7 years ago

This would be a great feature to have. The only reason why we are using parent/child instead of nested mapping is the lack of min_children/max_children options in the nested query. Considering that:

  1. Nested queries are much faster than has_child queries;
  2. Elasticsearch is moving in the one-type-per-index direction;

I would very much like to see this implemented. Please let me know if there's anything I can do to help.

turp1twin commented 6 years ago

Any update on this? Would be really, really good to have this feature!

andyb-elastic commented 6 years ago

@elastic/es-search-aggs

colings86 commented 6 years ago

Stalled waiting for https://issues.apache.org/jira/browse/LUCENE-6354 to be completed and merged

bw2 commented 5 years ago

+1

thaDude commented 5 years ago

+1

I'd love to see this.

To give some context - while the main reason for us to migrate to a parent/child model from nested was indexing speed we also did so because of the min_children and max_children feature.

However, we have become painfully aware of the cost of has_child queries (joins) as the number of child documents and/or complexity of queries increases. OOM exceptions have become too frequent for comfort.

For stability reasons, we are re-considering the nested model even if it means decreased indexing speed. Knowing, that min_children and max_children for that model are still being planned would re-assure us.

Thank you!

xethorn commented 5 years ago

Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.

Alternative solution

To support min_children and max_children for your nested query, all you have to do is to use a function_score query. To make this more concrete: you have an index called Person with the following mapping:

Cases

To effectively support min_children and max_children, there are multiple queries you need to consider:

Depending on the scenario, the request will look different.

Notes about function_score:

Find all persons have no children

Explanation: easiest query, you simply have to verify there are no nested documents. It is significantly faster than using the function_score.

{
    "query": {
        "bool": {
            "must_not": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    }
                }
            }    
        }
    }
}

Find all persons have a minimum of n children

Explanation: each matching document is boosted by 10 and the nested query sums them. The function_score filters out any document that is less than what is expected.

Example: Find all persons who have a minimum of 2 children: the boost applied here is 10 (you can set any number you want here), as such the min_score is 20 (2 * 10).

Before Elastic 7:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have between n and m children (n > 0).

Explanation: same as above, except here we apply a script to alter the score. This script checks that the sum of all boosts is not exceeding m * boost, if it does, it returns 0 which automatically guarantee the document will be excluded (0 < min_score).

Example: Find all persons who have a minimum of 2 children and a maximum of 5 children (inclusive).

Before Elastic 7:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "functions": {
                "script_score": {
                    "script": {
                        "source": "if (_score > 50) { return 0; } return _score;",
                        "lang": "painless"
                    }
                }
            },
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "functions": [
                {
                    "filter": {
                        "match_all": {
                            "boost": 1
                        }
                    },
                    "script_score": {
                        "filter": {
                            "match_all": {
                                "boost": 1
                            }
                        },
                        "script": {
                            "source": "if (_score > 50) { return 0; } return _score;",
                            "lang": "painless"
                        }
                    }
                }
            ],
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have less than n children (n > 0).

Explanation: this request means you are asking for persons who have no children and persons who have been 1 to n children. Expressing this with elastic can be tricky, so taking the negation makes it easier: you're asking to not find all persons who have more than n + 1 children.

Example: find all persons who have less than 2 children.

Before Elastic 7:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "query": {
                        "nested": {
                            "path": "children",
                            "query": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            },
                            "boost": 10,
                            "score_mode": "sum"
                        }
                    }
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "score_mode": "multiply",
                    "boost_mode": "replace",
                    "query": {
                        "nested": {
                            "path": "children",
                            "boost": 10,
                            "score_mode": "sum",
                            "query": {
                                "constant_score": {
                                    "boost": 1,
                                    "filter": {
                                        "exists": {
                                            "field": "children.first_name"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Hope this helps.

Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.

DustinJSilk commented 4 years ago

Any update on this?

I'm trying to filter my results based on an exact length. @xethorn I can't seem to get your solution working with filters, could you point me in the right direction?

Here's my search with filters, which don't support scoring:

GET /test/_search
{
  "query" : {
    "function_score": {
      "min_score": 20,
      "boost": 1,
      "functions": [
        {
          "script_score": {
            "script": {
                "source": "if (_score > 20) { return - 1; } return _score;"
            }
          }
        }
      ],
      "query": {
        "bool" : {
          "filter": [
            { "range": { "distance": { "lt": 5 }}},
            {
              "nested": {
                "score_mode": "sum",
                "boost": 10,
                "path": "dates",
                "query": {
                  "bool": {
                    "filter": [
                      { "range": { "dates.rooms": { "gte": 1 } } },
                      { "range": { "dates.timestamp": { "lte": 2 }}},
                      { "range": { "dates.timestamp": { "gte": 1 }}}
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

A few more details here: https://stackoverflow.com/questions/63226805/filter-query-by-length-of-nested-objects-ie-min-child

xethorn commented 3 years ago

Question was answered on slack overflow. For comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4. :)

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)