Open benesch opened 3 years ago
Requires also a bit of optimizer thinking as noted here, ~but should be possible to implement~. Actually, this seems quite hard, see below.
We might want to ban subqueries, as explained in the slack thread.
Additionally, we'll have to make sure that no optimization will move the erroring part of the scalar expression from inside the iferror
into a different operator. This is because if the erroring part of the scalar expression ends up in a different operator, then the scalar error would be converted to a dataflow error before reaching iferror
, and iferror
can only handle scalar errors.
For example, MapFilterProject.optimize
can change an MFP quite a bit; certainly enough that the erroring part of the argument expression of an IFERROR
call gets into a different scalar expression inside the same MFP. This is not a problem in itself, but might make it easier for other optimizations to move that expression around. And then there are various other optimizations which move scalar expressions around, e.g., inlining them into expressions in later operators or pushing Filter predicates downward. Also, some parts of some MFPs get absorbed into neighboring operators during lowering to LIR, e.g., an MFP on top of a Join can be broken down into several MFPs, which are applied at various stages of the join.
Related to the above, I'm not sure we can handle errors coming from aggregations: If we have IFERROR(SUM(x/0))
, then the x/0
will be executed in a different operator from the IFERROR
, so we have the same problem as above (but without even any optimizations).
I have an idea for solving the "optimizer moving things around problem" from above. It's a variation of this idea. We could add a new field called dont_error
to the CallUnary
, CallBinary
, CallVariadic
MirScalarExpr variants, which would cause the eval
function to catch an EvalError
and convert it into a special Datum variant called something like error_that_will_be_caught_by_iferror
. This would have similar semantics to null
, but the standard null-handling functions (e.g., IS NULL
) don't recognize it, so it can just freely propagate through expression evaluations until it is caught. Then, we could add a special pass somewhere in HIR that would set this field inside all function calls inside an IFERROR
argument expression. This would mean that no matter where the various subexpressions of the argument expression of IFERROR
end up, they wouldn't produce errors, but just this special datum, which would eventually reach the IFERROR
, which would handle it appropriately. (Note that we can't simply use null
instead of this special datum, because the argument expression of IFERROR
might have an IS NULL
or other null handling function somewhere, which would spuriously catch it before the null would reach IFERROR
.)
This is not solving the problem with the Filter error("more than one record produced in subquery")
pattern, so we would have the same limitation as Cockroach has, but it might be solving all the other problems from above. (Aggregation functions would need to handle this special datum.)
Thanks for thinking about this and writing these thoughts up, @ggevay!
Recording some thoughts from a Slack discussion.
The complexity you've uncovered makes me think there are dragons here that we might prefer to avoid.
As an alternative to a general iferror
function, we could instead introduce try_
variants of each function that return NULL
instead of erroring. For example, try_to_timestamp
would be the error-free variant of to_timestamp
; try_cast
would be the error-free variant of cast
.
The try_
approach is appealing because it is not a general-purpose exception handling mechanism. You don't need to worry about handling an error that occurred deep inside an expression (e.g., iferror(fn1(fn2(error_producing_function())))
)—the try-able semantics apply only to specific function.
Open question: is try_
flexible enough? The described semantics mean that the try_
variant of each function replaces errors with NULL
. But users might need to replace only some errors with NULL
, or distinguish between the function returning NULL
without erroring, and the function erroring. try_
won't enable any of these use cases.
The try_
variant approach sounds good to me.
Open question: is try flexible enough? The described semantics mean that the try variant of each function replaces errors with NULL. But users might need to replace only some errors with NULL, or distinguish between the function returning NULL without erroring, and the function erroring. try_ won't enable any of these use cases.
We could do something like Cockroach's IFERROR with some extra arguments:
try_
variants called val_on_error
, which would be returned on an error.error_code
, and only these errors would be caught.This would make the implementation a slightly bit more complex, because the arity of these functions would be different from the originals, but I think it's not too bad.
Yeah, I was thinking along those lines, but I'm worried that it'll read poorly. E.g. consider make_timestamp
and a hypothetical try_make_timestamp
:
SELECT make_timestamp(2020, 10, 25, 9, 30, 17.12);
SELECT try_make_timestamp(2020, 10, 25, 9, 30, 17.12, NULL);
It's already unreadable soup with the 6 arguments. The NULL
doesn't obviously read as "use this on error."
We might also want to consider syntax like:
SELECT make_timestamp(2020, 10, 25, 9, 30, 17.12 IF ERROR WHEN <code> THEN <expr> WHEN <code> THEN <expr> ELSE <expr>);
Idk, that's ugly too, but maybe reads a bit better and looks like the existing CASE
syntax and a little bit like CAST(... AS ...)
and EXTRACT( ... FROM )
where SQL bends the usual rules for function call parameters.
Personal take: I think the proposal where try_
very simply replaces all errors with NULL
s is useful enough to be worth the simple and clear syntax, even though it is limited in functionality. If we find users asking to extend the semantics of try_
functions (e.g., changing the value on error, or only handling errors with certain codes), at that point I think we should consider dedicated syntax.
A general principle in PL design is that for any partial operation, there should exist either (a) an easy way to predict whether or not it will succeed or (b) a way to handle its failure. TRYCAST
is (b), but I can imagine doing (a) instead: go through each *Func
in ScalarExpr
that can fail and adding metadata to say what its "success predicate" is---and then explicitly mention these in the docs.
E.g., the success predicate for division on integer types is that the second number is not 0; the success predicate for square root is that its input is a non-negative number; the success predicate for casting a number to text is matching some fixed regexp.
E.g., the success predicate for division on integer types is that the second number is not 0; the success predicate for square root is that its input is a non-negative number; the success predicate for casting a number to text is matching some fixed regexp.
How would you describe the success predicate for something like casting timestamptz
? The supported formats are much more complex than what a regex can describe.
That is a messy one! I think most languages would revert to an error-handling solution. If for some reason we wanted to not bother with a full TRYCAST
, you could write a custom TRYCAST
for such individual cases. Probably easier to just do TRYCAST
.
Consider a source in Materialize that has a text column that is sometimes JSON and sometimes not. If the user creates a view that parses this JSON
queries on that view like
SELECT * FROM v
will fail with "invalid json", without telling you which record had invalid JSON, or allowing you to ignore that one record.One proposal, via @justinj, is to import Cockroach's
IFERROR
function (https://github.com/cockroachdb/cockroach/pull/25304), which makes it possible to inspect the errors from SQL.