Open tv42 opened 9 months ago
This might share the root cause with #8909.
This may be fixed by https://github.com/apache/arrow-datafusion/pull/8928
This may be fixed by https://github.com/apache/arrow-datafusion/pull/8928
This issue is related with #8928 but is different with it.
Let me have explain this problem: It's cause by expression eval
.
When we try to fold expression
, we will eval expression. When we eval expression, it may occur Runtime Error.
But short-circuit expression
can be short-circuit
in Runtime, so I think we should skip some error and then use original (a possible solution) expression when fold short-circuit expression
.
short-circuit expression
to extract common expression, but can't avoid fold expression
how do you think about it? @alamb @liukun4515
BTW, Are @haohuaijin interested in this? If yes, when you meet some problem or need review, please feel free to @me.
Thank you @jackwener. I am interested in it. I have two solutions.
We can first fold the expression whenever, and if we encounter an error, return the origin expression in the below code, and the error will occur in runtime if it is a real error(not the short-circuit problem). https://github.com/apache/arrow-datafusion/blob/2b218be67a6c412629530b812836a6cec76efc32/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L282-L287
don't simplify short-circuit expressions' all sub-expressions.
I think this issue is also related to https://github.com/apache/arrow-datafusion/issues/8927, because the scalar function first evaluates all args and then execution the actual function, so even though we don't fold the expression, the query in this issue will also throw error.
I comment out SimplifyExpressions
in the main branch and got the following result.
❯ SELECT COALESCE(1, 2/0);
Arrow error: Divide by zero error
❯ explain SELECT COALESCE(1, 2/0);
+---------------+-------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-------------------------------------------------------------------------------------+
| logical_plan | Projection: coalesce(Int64(1), Int64(2) / Int64(0)) |
| | EmptyRelation |
| physical_plan | ProjectionExec: expr=[coalesce(1, 2 / 0) as coalesce(Int64(1),Int64(2) / Int64(0))] |
| | PlaceholderRowExec |
| | |
+---------------+-------------------------------------------------------------------------------------+
2 rows in set. Query took 0.007 seconds.
We can first fold the expression whenever, and if we encounter an error, return the origin expression in the below code, and the error will occur in runtime if it is a real error(not the short-circuit problem).
This makes sense to me @haohuaijin and @jackwener - I also played around with posgres and this seems consistent with what it does.
Describe the bug
Datafusion evaluates
COALESCE(a, b, c)
expressions even past the first non-NULL value, and bubbles up runtime errors from them.This is wrong because the COALESCE could be explicitly protecting against e.g. the divide-by-zero case, and differs from other sql engines.
To Reproduce
Expected behavior
COALESCE
to return the first non-NULL value, not bubble up errors from the rest.Compare to SQLite:
Compare to Postgres:
Additional context
No response