apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.26k stars 1.23k forks source link

Eager evaluation of case when else end statements in transformation functions #13197

Open diegs opened 1 month ago

diegs commented 1 month ago

We discovered this issue with transformation functions for realtime tables. We have a transform function like the following:

case when length(columnName) = 64 then bytesToBigDecimal(hexToBytes(concat('0000', columnName))) else bytesToBigDecimal(hexToBytes(concat('000000', columnName))) end

The goal of this function is to convert a variable-length signed 256-bit integer, represented in hex, to a BigDecimal. The bytesToBigDecimal expects a twos-complement value (with the two byte "scale" prepended, here we always use 0000). So we need to special case: if the number is 32 bytes (64 hex characters), then we assume that it is correctly twos complement and only prepend 0000 for the scale. Otherwise, we prepend an extra padding byte of 00 to ensure that the hex is interpreted as a positive twos complement value.

The issue: due to an issue the upstream was emitting an empty hex string instead of 00 for columnName to represent zero. We would expect this not to affect the transformation function, because length(columnName) is 0, so it should trigger the else clause, and bytesToBigDecimal(hexToBytes(concat('000000', ''))) evaluates to 0.

However, we started seeing exceptions in our pinot logs and the records were dropped. The exception chain is (note that the transformation function was slightly rewritten by pinot):

Caused by: java.lang.RuntimeException: Caught exception while executing function: caseWhen(equals(length(columnName),'64'),bytesToBigDecimal(hexToBytes(concat('0000',columnName))),bytesToBigDecimal(hexToBytes(concat('000000',columnName))))
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:241) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator.evaluate(InbuiltFunctionEvaluator.java:113) ~[pinot-all....]
    at org.apache.pinot.segment.local.recordtransformer.ExpressionTransformer.transform(ExpressionTransformer.java:123) ~[pinot-all....]
    ... 7 more
Caused by: java.lang.RuntimeException: Caught exception while executing function: bytesToBigDecimal(hexToBytes(concat('0000',columnName)))
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:241) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:224) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator.evaluate(InbuiltFunctionEvaluator.java:113) ~[pinot-all....]
    at org.apache.pinot.segment.local.recordtransformer.ExpressionTransformer.transform(ExpressionTransformer.java:123) ~[pinot-all....]
    ... 7 more
Caused by: java.lang.IllegalStateException: Caught exception while invoking method: public static java.math.BigDecimal org.apache.pinot.common.function.scalar.DataTypeConversionFunctions.bytesToBigDecimal(byte[]) with arguments: [[B@10fb20f6]
    at org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:142) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:239) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:224) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator.evaluate(InbuiltFunctionEvaluator.java:113) ~[pinot-all....]
    at org.apache.pinot.segment.local.recordtransformer.ExpressionTransformer.transform(ExpressionTransformer.java:123) ~[pinot-all....]
    ... 7 more
Caused by: java.lang.reflect.InvocationTargetException
    at jdk.internal.reflect.GeneratedMethodAccessor244.invoke(Unknown Source) ~[?:?]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
    at org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:139) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:239) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:224) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator.evaluate(InbuiltFunctionEvaluator.java:113) ~[pinot-all....]
    at org.apache.pinot.segment.local.recordtransformer.ExpressionTransformer.transform(ExpressionTransformer.java:123) ~[pinot-all....]
    ... 7 more
Caused by: java.lang.NumberFormatException: Zero length BigInteger
    at java.base/java.math.BigInteger.<init>(BigInteger.java:312) ~[?:?]
    at java.base/java.math.BigInteger.<init>(BigInteger.java:340) ~[?:?]
    at org.apache.pinot.spi.utils.BigDecimalUtils.deserialize(BigDecimalUtils.java:96) ~[pinot-all....]
    at org.apache.pinot.common.function.scalar.DataTypeConversionFunctions.bytesToBigDecimal(DataTypeConversionFunctions.java:90) ~[pinot-all....]
    at jdk.internal.reflect.GeneratedMethodAccessor244.invoke(Unknown Source) ~[?:?]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
    at org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:139) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:239) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:224) ~[pinot-all....]
    at org.apache.pinot.segment.local.function.InbuiltFunctionEvaluator.evaluate(InbuiltFunctionEvaluator.java:113) ~[pinot-all....]
    at org.apache.pinot.segment.local.recordtransformer.ExpressionTransformer.transform(ExpressionTransformer.java:123) ~[pinot-all....]
    ... 7 more

The length of the column is definitely not 64 digits (I verified that no values in the source data are 64 digits long), so the first branch should not be evaluated here. The exception indicates that it is evaluating this branch though, and then failing when it throws an exception. Based on some additional experimentation it seems that both branches of the case when statement in the transformation are evaluated eagerly, and the exception from the side that fails bubbles up and breaks the transformation function statement, even though its value wouldn't be used anyway.

I don't think the semantics of the case statement should be to have eager evaluation, both from a correctness and performance standpoint. I can't reproduce this when writing a regular query, so I think this has to do with different behavior in the way transformation functions are evaluated.

diegs commented 1 month ago

Note: the workaround we used was to invert the transformation function as follows:

bytesToBigDecimal(hexToBytes(case when length(columnName) = 64 then concat('0000', columnName) else concat('000000', columnName) end))

Then the case statement never fails when evaluating both branches, and the correct value is passed to hexToBytes/bytesToBigDecimal.