Closed daveqs closed 2 weeks ago
@colin-ho can you take a look?
Hey @daveqs ! The reason for your error is because Daft is trying to create a schema from the if_else
expression, and to do that it needs to check the datatypes of both left/right sides of the if_else to determine the result type. For the specific case of the expression: test2_df['struct_col'].struct.get('key_C')
, the datatype cannot be determined as the key does not exist, and therefore the error is thrown.
That being said, we can make your desired behavior work, i.e. modify if_else
to only selectively check the datatypes of the left or right side based on the result of the predicate. However this is only possible for predicates that can be evaluated at planning time, something like daft.lit(idx > -1).if_else
will work, but (daft.col("col") > 1).if_else
won't be possible, as this expression requires knowledge of the column.
Let me know if this is ok for you!
Hi @colin-ho , in my opinion what you described is the preferred behavior for if_else, and it would solve my use case (though I understand the limitation that the logical must be evaluated during query planning).
Do you see any downside to implementing this behavior?
Hi @colin-ho , in my opinion what you described is the preferred behavior for if_else, and it would solve my use case (though I understand the limitation that the logical must be evaluated during query planning).
Do you see any downside to implementing this behavior?
None I can think of, I can make a PR for this this week.
Hi @colin-ho , in my opinion what you described is the preferred behavior for if_else, and it would solve my use case (though I understand the limitation that the logical must be evaluated during query planning).
Do you see any downside to implementing this behavior?
None I can think of, I can make a PR for this this week.
Great, thank you!
Hey @daveqs ! Just merged in the PR for this fix, should be available in the upcoming release next week!
Describe the bug When applying the if_else expression following a logical expression, both sides of the logical expression must exist regardless if the logical is true or false. If one side only exists when the logical is true but does't exist when it is false, or visa versa, daft returns
ValueError: DaftError::External Unable to create logical plan node.
To Reproduce Run the following Python example
Expected behavior In the above example, I expect test2_df to be identical to test1_df. This would be the case if daft only evaluated the left side of the if_else() expression when the logical is is being applied to (in the example, (daft.lit(idx) > daft.lit(-1)) ) is true and the right side of the if_else() expression when its logical is false.
Desktop (please complete the following information):