elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.6k stars 24.63k forks source link

Improve validation of filter(s)/nested aggregations #23290

Open jpountz opened 7 years ago

jpountz commented 7 years ago

Currently it is possible to mix nested and filter(s) aggregations in such a way that the filter aggregation matches documents that are not in the same scope as its parent aggregation, which can cause weird failures, see for instance #23280. We should improve validation to catch these problems earlier and raise a nicer error message.

jpountz commented 7 years ago

@martijnvg I wonder whether the NestedScope that you put in place some time ago could help improve validation?

martijnvg commented 7 years ago

@jpountz I think it can be used for that. I think we can add validation in NestedScope#nextLevel(...) method where we both know the current and next nested object level. I think we can check the nestedTypePathAsString() of each object mapper (current has to be a prefix of the next level). If we can place this validation here then the validation would apply for nested query, nested agg and reverse_nested agg..

martijnvg commented 7 years ago

I think that doc values based block join can help with validation too. The nested Lucene documents will record the layout and that will make validation easier.

markharwood commented 7 years ago

Ideally we'd validate all field references (terms aggs etc) to check they were relevant to the current nested agg/query scope but sadly we have to support unmapped fields too so we can't tell the difference between an unmapped field and one that is missing the correct path prefix.

Currently the only time we can validate that the wrong nested path has been used is if it also happens to be a valid path to a a field from the doc root.

Could we have a search option to only allow references to mapped fields? This would help us significantly tighten up this sort of query validation.

martijnvg commented 6 years ago

Could we have a search option to only allow references to mapped fields? This would help us significantly tighten up this sort of query validation.

+1 I think we should have such an option that builds on top of QueryShardContext#setAllowUnmappedFields(...) infrastructure that already exists.

markharwood commented 6 years ago

QueryShardContext#setAllowUnmappedFields(...) infrastructure that already exists.

@jpountz was +1 for removing that feature - it is currently configured as part of index settings which are undocumented. The validation requirement we settled for is a per-query choice (because queries can span many indices. We allow individual indices to fail but only fail if all targeted indices were reporting a field as unmapped. This validation relies on shards returning sets of unmapped field names to be ANDed at the coordinating node. Additionally, we need to "know when we don't know" - recent query optimisations mean that we sometimes skip parsing/executing some aspects of a query on some shards. We can't start throwing errors if there's a possibility that one of the skipped indices could have confirmed the validity of a field name but was skipped for efficiency's sake. PR https://github.com/elastic/elasticsearch/pull/26115 if you're interested.

martijnvg commented 6 years ago

@markharwood That makes sense. Thx for explaining.

polyfractal commented 6 years ago

Are we still looking to fix this particular issue in isolation, or is the idea to wait for a more global fix like https://github.com/elastic/elasticsearch/pull/26115 ?

@elastic/es-search-aggs

jpountz commented 5 years ago

41559 is another issue with nested validation.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)