apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.54k stars 1.3k forks source link

DISCOUNTHLL function returns an error when value of log2m is not 8 #12839

Closed saurabhlambe closed 2 months ago

saurabhlambe commented 7 months ago

Example query:

SELECT DISTINCTCOUNTHLL(analytics_session_id_if_mobility_book,16) AS value FROM f_pax_app_sessions WHERE session_start_timestamp_10m >= '2024-01-12 17:00:00.0' AND session_start_timestamp_10m < '2024-01-15 17:00:00.0' HAVING value >= 0 LIMIT 10000000

As per Pinot docs, the DISCOUNTHLL function takes 2 arguments, log2m being optional. When the value of log2m is 8, the query runs correctly, when used a different value, it throws the following error:

[BaseSingleBlockCombineOperator] [pqr-6] Caught exception while merging results blocks (query: QueryContext{_tableName='f_pax_app_sessions_OFFLINE', _subquery=null, _selectExpressions=[distinctcounthll(analytics_session_id,'16')], _distinct=false, _aliasList=[value], _filter=(is_mobility_home_or_mobility_confirmation = '1' AND (session_start_timestamp_10m >= '1707238800000' AND session_start_timestamp_10m < '1707498000000')), _groupByExpressions=null, _havingFilter=distinctcounthll(analytics_session_id,'16') >= '0', _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=4902}, _expressionOverrideHints={}, _explain=false}) java.lang.IllegalStateException: Cannot merge HyperLogLogs of different sizes
 at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:512) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.aggregation.function.DistinctCountHLLAggregationFunction.merge(DistinctCountHLLAggregationFunction.java:338) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.aggregation.function.DistinctCountHLLAggregationFunction.merge(DistinctCountHLLAggregationFunction.java:41) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.combine.merger.AggregationResultsBlockMerger.mergeResultsBlocks(AggregationResultsBlockMerger.java:42) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.combine.merger.AggregationResultsBlockMerger.mergeResultsBlocks(AggregationResultsBlockMerger.java:27) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.mergeResults(BaseSingleBlockCombineOperator.java:146) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.getNextBlock(BaseSingleBlockCombineOperator.java:62) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.getNextBlock(BaseSingleBlockCombineOperator.java:45) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.InstanceResponseOperator.getCombinedResults(InstanceResponseOperator.java:118) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:111) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:39) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:57) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.executeInternal(ServerQueryExecutorV1Impl.java:376) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.executeInternal(ServerQueryExecutorV1Impl.java:252) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.execute(ServerQueryExecutorV1Impl.java:135) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.executor.QueryExecutor.execute(QueryExecutor.java:59) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:154) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:136) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
 at org.apache.pinot.shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at org.apache.pinot.shaded.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
 at java.lang.Thread.run(Thread.java:829) [?:?]
gortiz commented 7 months ago

Can this be replicated in any quickstart? Can you also specify the Pinot version where the error is present and the types and indexes involved in the query/table?

gortiz commented 7 months ago

It seems the issue was a bit more complex than expected. There is a DISTINCT_COUNT_HLL star-tree index on that column. That precalculation was done with the default log2m. Therefore when a query touches a segment whose data HLL is read from the star-tree index and another where the data has been calculated at runtime with a different log2m, the error is thrown.

I think Pinot should detect the discrepancy in the arguments used and do not use the star-tree index in this case. What do you think @Jackie-Jiang ?

Jackie-Jiang commented 7 months ago

Yes. This requires bigger changes. Basically star-tree should keep the extra arguments stored in the metadata, and match the whole aggregation. Currently it only stores the main column and aggregation type, thus causing this problem.

yashmayya commented 2 months ago

Will be fixed by https://github.com/apache/pinot/pull/13835.