Open eejbyfeldt opened 4 days ago
Currently we do not consider the volatility of expressions in SimplifyExpressions. This leads us to doing rewrites that might change the results and lead to unexpected behavior.
Consider the following query:
> explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2 AND random() = 0); +---------------+---------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------+ | logical_plan | Filter: random() = Float64(0) | | | Values: (Int64(1)), (Int64(2)) | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: random() = 0 | | | ValuesExec | | | | +---------------+---------------------------------------------+ 2 row(s) fetched. Elapsed 0.013 seconds.
The predicate get simplified into random() = 0
random() = 0
The predicate should not be simplified so we deduplicat the volatile expressions.
> explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2 AND random() = 0); +---------------+----------------------------------------------------------------------------------+ | plan_type | plan | +---------------+----------------------------------------------------------------------------------+ | logical_plan | Filter: random() = Float64(0) OR column1 = Int64(2) AND random() = Float64(0) | | | Values: (Int64(1)), (Int64(2)) | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: random() = 0 | | | ValuesExec | | | | +---------------+----------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.013 seconds.
We can not exclude volatile expressions outright from simplification as we would still like the simplify for example following predicate
> explain select * from VALUES (1), (2) where column1 = 2 OR (column1 = 2 AND random() = 0); +---------------+---------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------+ | logical_plan | Filter: column1 = Int64(2) | | | Values: (Int64(1)), (Int64(2)) | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@0 = 2 | | | ValuesExec | | | | +---------------+---------------------------------------------+ 2 row(s) fetched. Elapsed 0.015 seconds.
As it does not change the result.
take
Describe the bug
Currently we do not consider the volatility of expressions in SimplifyExpressions. This leads us to doing rewrites that might change the results and lead to unexpected behavior.
To Reproduce
Consider the following query:
The predicate get simplified into
random() = 0
Expected behavior
The predicate should not be simplified so we deduplicat the volatile expressions.
Additional context
We can not exclude volatile expressions outright from simplification as we would still like the simplify for example following predicate
As it does not change the result.