apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.49k stars 3.7k forks source link

HLLSketchEstimateWithBounds can't be applied to arithmetic post aggregator #8504

Closed Foxterran closed 5 years ago

Foxterran commented 5 years ago

HLLSketchEstimateWithBounds can't be used as a numeric value in arithmetric post aggregator, because it's returning an array [estimate value, upper bounds, lower bounds]

Affected Version

0.14.2

Description

I have two HLLSketch metrics called unique_click_visitors and unique_view_visitors, it's built with "HLLSketchBuild" module during ingestion time, it's doing well via aggregator module, however, I need to calculate the ratio of the metrics, i.e. unique_click_visitors / unique_view_visitors to get the click through rate.

Per this doc https://druid.apache.org/docs/latest/development/extensions-core/datasketches-hll.html , I started to use HLLSketchEstimateWithBounds rounded with arithmetic post aggregator, the query spec (only postAggregations part) is as below:

"postAggregations": [ { "type": "arithmetic", "name": "click_through_rate", "fn": "/", "fields": [ { "type": "HLLSketchEstimateWithBounds", "name": "unique_click_visitors ", "field": { "type": "fieldAccess", "name": "unique_click_visitors ", "fieldName": "unique_click_visitors " } }, { "type": "HLLSketchEstimateWithBounds", "name": "unique_view_visitors ", "field": { "type": "fieldAccess", "name": "unique_view_visitors ", "fieldName": "unique_view_visitors " } } ] } ]

However, it kept showing exception: { "error": "Unknown exception", "errorMessage": "[D cannot be cast to java.lang.Number", "errorClass": "java.lang.ClassCastException", "host": null }

I looked at the source code, https://github.com/apache/incubator-druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchToEstimateWithBoundsPostAggregator.java

looks like this estimator is not only return the estimate value, but also its upper bound and lower bound, packaging the result as an array.

I can bypass this error by doing the post aggregation in application layer, although. But anyone else suffering this issue? Please help to share your way out here?

Thank YOU!

Foxterran commented 5 years ago

looks like we can use finalizingFieldAccess to solve this issue, changing the postAggregations part into:

"postAggregations": [ {"type": "arithmetic", "name": "click_through_rate", "fn": "/", "fields":[ {"type" : "finalizingFieldAccess", "name" : "unique_click_visitors ", "fieldName" : "unique_click_visitors " }, {"type" : "finalizingFieldAccess", "name" : "unique_view_visitors ", "fieldName" : "unique_view_visitors "} ] } ]