elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.94k stars 24.74k forks source link

Validate agg order parameter at parse time #20003

Open colings86 opened 8 years ago

colings86 commented 8 years ago

At the moment we don't validate that the order parameter for the histogram, date_histogram and terms aggregations is valid at parse time. Instead we rely on an exception to be thrown when we try to sort the buckets at reduce time. Apart from the not being a nice user experience as we should fail quickly on bad requests this also causes a few other issues highlighted in #14771 and #19851:

qwerty4030 commented 8 years ago

I think the terms aggregation validates the order during the search phase. I get a 500 response with a bad terms agg order vs a 503 from a bad (date) histogram order. Example. It looks like validation code does exist in AggregationPath.java. The code gets a bit hairy but I think the reason this is called during the search phrase is that all the Aggregators need to be created before validation which happens in the AggregationPhase. Would it be possible to create the Aggregators during the parsing phase and then perform validation? If that's too painful it seems it wouldn't be too hard to call the same validation logic the terms agg uses during the search phrase for the (date) histogram.

qwerty4030 commented 8 years ago

Also the terms agg supports multiple criteria for the order, but the (date) histogram agg does not: "order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]. I don't see a reason why the histogram agg order would have different behavior than the terms agg. The class org.elasticsearch.search.aggregations.bucket.terms.InternalOrder.CompoundOrder contains the code to support multiple criteria for the terms agg order.

colings86 commented 8 years ago

@qwerty4030 the parsing happens on the coordinating node (the node that receives the request from the client) but Aggregators are created on the shard so we can't create the Aggregators during the parsing phase and perform the validation with them. I think we will need to create code to do the validation using the AggregatorBuilder objects instead since this is what we have available on the coordinating node at parse time.

As for the discrepancy between order in the terms aggregation and the histogram aggregations, it would be nice to have them consistent. Maybe we could have both use the same Order classes rather than having separate implementations but I fear that this will be fairly tricky.

qwerty4030 commented 8 years ago

I can see this happening in two parts:

  1. Refactor histogram aggs to use the same order code/logic as the terms agg.
  2. Refactor order to be validated at parse time (use AggregatorBuilder instead of Aggregator).

1 doesn't seem too bad since the code already exists. 2 is more involved: I saw alot of instanceofs, casting, and tricky assumptions in the current order validation. Ideally the order validation can be cleaned up to be more interface friendly.

javanna commented 7 years ago

@colings86 does this issue need further discussion?

qwerty4030 commented 7 years ago

Well now that step 1 was done in #22343 we can give part 2 a shot. @colings86 @javanna Do you guys see any obvious roadblocks to validate terms and histogram aggs that are ordered by sub-aggregation, e.g. multi sub-agg levels, metrics, scripts, etc... by just looking at the AggregatorBuilder instance? Thanks

colings86 commented 7 years ago

@qwerty4030 I don't see any roadblocks. I think this should be possible since the pipeline aggregations do something a little similar when validating the buckets_path, however I'm not going to go so far as saying that its going to be easy. Maybe we should start with trying to add the validation on AggregatorBuilder's without modifying the existing logic with the Aggregator's (so essentially do both) and then once we know the validation using the AggregatorBuilder's is robust we can remove any unnecessary logic later int he execution?

qwerty4030 commented 7 years ago

Sounds good. Thanks for the tip about the pipeline aggregations. I found some relevant code here, but I'm not sure how robust this validation is.

polyfractal commented 6 years ago

@elastic/es-search-aggs

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

wchaparro commented 1 year ago

Because the exception is thrown in the reduce phase it is a SearchPhaseExecutionException which maps to a 503 Unavailable status code when in fact it should return a 400 Bad Request status code and ideal the exception should be a SearchParseException

Changing this to a bug - to address that we are returning the correct error code.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)