Open diegs opened 1 month ago
Note: the workaround we used was to invert the transformation function as follows:
bytesToBigDecimal(hexToBytes(case when length(columnName) = 64 then concat('0000', columnName) else concat('000000', columnName) end))
Then the case statement never fails when evaluating both branches, and the correct value is passed to hexToBytes/bytesToBigDecimal.
We discovered this issue with transformation functions for realtime tables. We have a transform function like the following:
The goal of this function is to convert a variable-length signed 256-bit integer, represented in hex, to a BigDecimal. The bytesToBigDecimal expects a twos-complement value (with the two byte "scale" prepended, here we always use
0000
). So we need to special case: if the number is 32 bytes (64 hex characters), then we assume that it is correctly twos complement and only prepend0000
for the scale. Otherwise, we prepend an extra padding byte of00
to ensure that the hex is interpreted as a positive twos complement value.The issue: due to an issue the upstream was emitting an empty hex string instead of
00
forcolumnName
to represent zero. We would expect this not to affect the transformation function, becauselength(columnName)
is0
, so it should trigger theelse
clause, andbytesToBigDecimal(hexToBytes(concat('000000', '')))
evaluates to0
.However, we started seeing exceptions in our pinot logs and the records were dropped. The exception chain is (note that the transformation function was slightly rewritten by pinot):
The length of the column is definitely not 64 digits (I verified that no values in the source data are 64 digits long), so the first branch should not be evaluated here. The exception indicates that it is evaluating this branch though, and then failing when it throws an exception. Based on some additional experimentation it seems that both branches of the
case when
statement in the transformation are evaluated eagerly, and the exception from the side that fails bubbles up and breaks the transformation function statement, even though its value wouldn't be used anyway.I don't think the semantics of the case statement should be to have eager evaluation, both from a correctness and performance standpoint. I can't reproduce this when writing a regular query, so I think this has to do with different behavior in the way transformation functions are evaluated.