apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Boolean expression on top of a MV column filter fails #10478

Open kirkrodrigues opened 1 year ago

kirkrodrigues commented 1 year ago

Attempting to execute a query like SELECT * FROM table WHERE (mvCol = 'test') = true, fails with the exception:

2023/03/25 01:17:47.521 ERROR [BaseCombineOperator] [pqw-2] Caught exception while processing query: QueryContext{_tableName='random_REALTIME', _subquery=null, _selectExpressions=[multiCol, timestamp], _aliasList=[null, null], _filter=equals(multiCol,'null') = 'true', _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _expressionOverrideHints={}, _explain=false}
java.lang.UnsupportedOperationException: null
    at org.apache.pinot.segment.spi.index.mutable.MutableForwardIndex.readDictIds(MutableForwardIndex.java:74) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.segment.spi.index.mutable.MutableForwardIndex.readDictIds(MutableForwardIndex.java:79) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:570) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:239) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:277) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:153) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.transform.function.IdentifierTransformFunction.transformToStringValuesSV(IdentifierTransformFunction.java:111) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.transform.function.BinaryOperatorTransformFunction.fillResultString(BinaryOperatorTransformFunction.java:284) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.transform.function.BinaryOperatorTransformFunction.fillResultArray(BinaryOperatorTransformFunction.java:135) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.transform.function.BinaryOperatorTransformFunction.transformToIntValuesSV(BinaryOperatorTransformFunction.java:109) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.dociditerators.ExpressionScanDocIdIterator.processProjectionBlock(ExpressionScanDocIdIterator.java:156) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.dociditerators.ExpressionScanDocIdIterator.next(ExpressionScanDocIdIterator.java:88) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:75) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:39) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.ProjectionOperator.getNextBlock(ProjectionOperator.java:70) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.ProjectionOperator.getNextBlock(ProjectionOperator.java:37) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:97) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:41) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.processSegments(BaseSingleBlockCombineOperator.java:102) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:107) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at org.apache.pinot.shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at org.apache.pinot.shaded.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3f36623c563b5ad495a502a00f627b44bfc0861d]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]

Version

apache/pinot@3f36623

Jackie-Jiang commented 1 year ago

This query fails because mvCol = 'test' is treated as a transform function (instead of filter predicate), and currently BinaryOperatorTransformFunction doesn't support MV as input.

SELECT * FROM table WHERE mvCol = 'test' should work

kirkrodrigues commented 1 year ago

Hey Jackie, yeah, the query is contrived to expose the bug. I actually encountered it while writing a query rewriter:

  1. SELECT * FROM table WHERE clpMatch("message", '*123*') is parsed into a PinotQuery that's equivalent to SELECT * FROM table WHERE clpMatch("message", '*123*') = true.
  2. Then the custom rewriter replaces clpMatch so the query becomes something like SELECT * FROM table WHERE (message_logtype = ... AND message_dictionaryVars LIKE '*123*' ...) = true, where message_dictionaryVars is an MV column. So the query fails with the above error.

I'm currently working around this by specially detecting the boolean comparison with clpMatch, but it would be great to resolve this correctly.