apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

NullPointerException is thrown when query with aggregation on top of groovy functions #6253

Open chundongwang opened 3 years ago

chundongwang commented 3 years ago

I’m getting NPE in GroupByOrderByCombineOperator.getNextBlock when the query does aggregation on top of groovy function, no matter which column I group-by with (eg dimension, datetime), or what aggregation function I use (eg AVG, SUM). The query would look like,

select
  country_code,
  avg(groovy('{"returnType":"DOUBLE","isSingleValue":true}', 'arg0 > arg1 ? arg0 : arg1', subtotal, total)) as average_rev
from orders
group by country_code
limit 10

Without group by, aforementioned groovy function works fine with enough records.

Expected results

Expect aggreation on top of transform would work for groovy just like other transform function like ADD or SUB

Actual results

NPE thrown as,

QueryExecutionError:
java.lang.NullPointerException
  at org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.getNextBlock(GroupByOrderByCombineOperator.java:215)
  at org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.getNextBlock(GroupByOrderByCombineOperator.java:62)
  at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)
  at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:37)
  at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:26)
  at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)
  at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:48)
  at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:221)
  at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155)
  at org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:139)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
  at shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)

UPDATE1: We're running v0.5.0

chundongwang commented 3 years ago

With the help from @npawar (thanks!), I got it to work. So to sum up,

  1. Aggregation functions like Avg/Sum/PercentileTDigest## would all require double so need to convert the groovy return to be double, which means all possible return values should be cast to double (eg using column_name.toDouble()) ;
  2. I have a number literal, 0.0, in the query which I assumed to make it double. Actually that became a BigDecimal and cause issue. So instead I used 0d and second exception is gone.
  3. More exception could be found on server log, not broker log.