Open jpountz opened 7 years ago
@martijnvg I wonder whether the NestedScope
that you put in place some time ago could help improve validation?
@jpountz I think it can be used for that. I think we can add validation in NestedScope#nextLevel(...)
method where we both know the current and next nested object level. I think we can check the nestedTypePathAsString()
of each object mapper (current has to be a prefix of the next level). If we can place this validation here then the validation would apply for nested query, nested agg and reverse_nested agg..
I think that doc values based block join can help with validation too. The nested Lucene documents will record the layout and that will make validation easier.
Ideally we'd validate all field references (terms
aggs etc) to check they were relevant to the current nested agg/query scope but sadly we have to support unmapped fields too so we can't tell the difference between an unmapped field and one that is missing the correct path prefix.
Currently the only time we can validate that the wrong nested path has been used is if it also happens to be a valid path to a a field from the doc root.
Could we have a search option to only allow references to mapped fields? This would help us significantly tighten up this sort of query validation.
Could we have a search option to only allow references to mapped fields? This would help us significantly tighten up this sort of query validation.
+1 I think we should have such an option that builds on top of QueryShardContext#setAllowUnmappedFields(...)
infrastructure that already exists.
QueryShardContext#setAllowUnmappedFields(...) infrastructure that already exists.
@jpountz was +1 for removing that feature - it is currently configured as part of index settings which are undocumented. The validation requirement we settled for is a per-query choice (because queries can span many indices. We allow individual indices to fail but only fail if all targeted indices were reporting a field as unmapped. This validation relies on shards returning sets of unmapped field names to be ANDed at the coordinating node. Additionally, we need to "know when we don't know" - recent query optimisations mean that we sometimes skip parsing/executing some aspects of a query on some shards. We can't start throwing errors if there's a possibility that one of the skipped indices could have confirmed the validity of a field name but was skipped for efficiency's sake. PR https://github.com/elastic/elasticsearch/pull/26115 if you're interested.
@markharwood That makes sense. Thx for explaining.
Are we still looking to fix this particular issue in isolation, or is the idea to wait for a more global fix like https://github.com/elastic/elasticsearch/pull/26115 ?
@elastic/es-search-aggs
Pinging @elastic/es-analytical-engine (Team:Analytics)
Currently it is possible to mix
nested
andfilter(s)
aggregations in such a way that thefilter
aggregation matches documents that are not in the same scope as its parent aggregation, which can cause weird failures, see for instance #23280. We should improve validation to catch these problems earlier and raise a nicer error message.