Closed milenkovicm closed 6 days ago
the change is quite similar to #9906 there are quite a lot optional/unused parameters in the method call, but i personally find it easier to understand as it is equivalent to AggregateFunction signature.
I wonder if ExprSimplifyResult::<T>::Original(T)
where T can be something else than args
(tuples included) would be a backward compatible change. This way we could send and return all required parameters without copy ... but this is discussion for different time
jsut thinking aloud, maybe if we change:
pub enum ExprSimplifyResult {
/// The function call was simplified to an entirely new Expr
Simplified(Expr),
/// the function call could not be simplified, and the arguments
/// are return unmodified.
Original(Vec<Expr>),
}
to:
pub enum ExprSimplifyResult<T> {
/// The function call was simplified to an entirely new Expr
Simplified(Expr),
/// the function call could not be simplified, and the arguments
/// are return unmodified.
Original(T),
}
simplify method would change from:
pub fn simplify(
&self,
args: Vec<Expr>,
distinct: &bool,
filter: &Option<Box<Expr>>,
order_by: &Option<Vec<Expr>>,
null_treatment: &Option<NullTreatment>,
info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult> {
to:
pub fn simplify(
&self,
args: Vec<Expr>,
distinct: bool,
filter: Option<Box<Expr>>,
order_by: Option<Vec<Expr>>,
null_treatment: Option<NullTreatment>,
info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult<(Vec<Expr>, bool, Option<Box<Expr>> ...)>> {
}
so in both cases, when we create replacement function or we re-assemble original function we would not need to clone parameters.
maybe hand full of changes, from -> Result<ExprSimplifyResult>
to -> Result<ExprSimplifyResult<Vec<Expr>>>
Downside is that we would have to re-create expression if no simplification, plus change would be backward incompatible
pub enum ExprSimplifyResult<T> { /// The function call was simplified to an entirely new Expr Simplified(Expr), /// the function call could not be simplified, and the arguments /// are return unmodified. Original(T), }
I think it is a good idea.
jsut thinking aloud, maybe if we change:
I think if we are going to change the signature, I think we should use Transformed directly (and it is already parameterized)
Another option could be to make a simplify return optional closure
fn simplify(&self ) -> Option<Box<dyn Fn(AggregateFunction) -> datafusion_common::tree_node::Transformed<Expr>>> {
None
}
in this case expr_simplifier part would become:
Expr::AggregateFunction(AggregateFunction {func_def: AggregateFunctionDefinition::UDF(ref udaf), ..}) => {
match (udaf.simplify(), expr) {
(Some(simplify), Expr::AggregateFunction(a)) => simplify(a),
(_, e) => Transformed::no(e),
}
}
this solution:
other issues:
SimplifyInfo
is missing from signature, can be addedTransformed::no
does not make sense in this context as original expression has not been propagated (should we use actual expression instead of AggregateFunction or remove Transformed
from return typeTransformed
Another option is to introduce has_simplify
so we can simplify the code to
Expr::AggregateFunction(AggregateFunction {
func_def: AggregateFunctionDefinition::UDF(udaf),
args: AggregateFunctionArgs,
}) if udaf.has_simplify() => {
Transformed::yes(udaf.simplify()?)
}
fn simplify() -> Result<Expr>
The downside is that the user needs to both define simplify
and set has_simplify
to true, but I think it is much simpler than the optional closure
The downside is that the user needs to both define
simplify
and sethas_simplify
to true, but I think it is much simpler than the optional closure
I personally try to avoid situations like this when creating API, this should be documented that both functions have to be implemented plus it introduces more corner cases.
Anyway, closure was a brain dump from yesterday, after a good sleep I don't think we need it, will try to elaborate in next comment
fn simplify(
&self,
expr: Expr,
) -> Result<datafusion_common::tree_node::Transformed<Expr>> {
// we know it'll always be AggregateFunction but we have to match on it
// which is not terribly hard but it makes api a bit odd
if let Expr::AggregateFunction(agg) = expr {
...
Ok(datafusion_common::tree_node::Transformed::yes(...))
} else {
// no re-creation of expression if not used
Ok(datafusion_common::tree_node::Transformed::no(expr))
}
}
default implementation is just:
fn simplify(
&self,
expr: Expr,
) -> Result<datafusion_common::tree_node::Transformed<Expr>> {
// no re-creation of expression if not used
Ok(datafusion_common::tree_node::Transformed::no(expr))
}
fn simplify(
&self,
agg: AggregateFunction,
) -> Result<datafusion_common::tree_node::Transformed<Expr>> {
// we know its aggregate function, no need to do extra matching, but we need to re-create expression
// issue user can return `no` with new expression in it
Ok(datafusion_common::tree_node::Transformed::no(Expr::AggregateFunction(agg)))
}
ExprSimplifyResult
fn simplify(
&self,
agg: AggregateFunction,
) -> Result<ExprSimplifyResult<AggregateFunction>> {
// user cant return `no` with new expression
// simplifier will re-crete expression in case original has been returned
Ok(ExprSimplifyResult::Original(agg))
}
IMHO, I prefer option number 3, which would involve parametrizing ExprSimplifyResult
, second best is option number 2.
wdyt @jayzhan211, @alamb
Option 1 - Original expression as a parameter
fn simplify( &self, expr: Expr, ) -> Result<datafusion_common::tree_node::Transformed<Expr>> { // we know it'll always be AggregateFunction but we have to match on it // which is not terribly hard but it makes api a bit odd if let Expr::AggregateFunction(agg) = expr { ... Ok(datafusion_common::tree_node::Transformed::yes(...)) } else { // no re-creation of expression if not used Ok(datafusion_common::tree_node::Transformed::no(expr)) } }
default implementation is just:
fn simplify( &self, expr: Expr, ) -> Result<datafusion_common::tree_node::Transformed<Expr>> { // no re-creation of expression if not used Ok(datafusion_common::tree_node::Transformed::no(expr)) }
Option 2 - AggregationFunction as a parameter
fn simplify( &self, agg: AggregateFunction, ) -> Result<datafusion_common::tree_node::Transformed<Expr>> { // we know its aggregate function, no need to do extra matching, but we need to re-create expression // issue user can return `no` with new expression in it Ok(datafusion_common::tree_node::Transformed::no(Expr::AggregateFunction(agg))) }
Option 3 - AggregationFunction as a parameter with parametrised
ExprSimplifyResult
fn simplify( &self, agg: AggregateFunction, ) -> Result<ExprSimplifyResult<AggregateFunction>> { // user cant return `no` with new expression // simplifier will re-crete expression in case original has been returned Ok(ExprSimplifyResult::Original(agg)) }
IMHO, I prefer option number 3, which would involve parametrizing
ExprSimplifyResult
, second best is option number 2. wdyt @jayzhan211, @alamb
Given these 3, I prefer 2, because I think re-create expression is not a high cost that we should avoid. And we can do even further,
If user does not implement simplify, it is equivalent to Transformed::no or ExprSimplifyResult::Original, so we can just have Result<Expr>
fn simplify(
&self,
agg: AggregateFunction,
) -> Result<Expr> {
Ok(Expr::AggregateFunction(agg))
}
I apologise for making noise, but it looks like all of the non closure options have issue with borrowing:
error[E0505]: cannot move out of `expr.0` because it is borrowed
--> datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:1391:48
|
1388 | func_def: AggregateFunctionDefinition::UDF(ref udaf),
| -------- borrow of `expr.0.func_def.0` occurs here
...
1391 | if let Expr::AggregateFunction(aggregate_function) = expr {
| ^^^^^^^^^^^^^^^^^^ move out of `expr.0` occurs here
1392 | udaf.simplify(aggregate_function, info)?
| ---- borrow later used here
It looks like AggregateArg
might be back on the table, which would involve cloning udf
@jayzhan211 and @alamb whenever you get chance please have a look, IMHO I find proposal with closure to tick all the boxes, but this one is definitely simpler.
@jayzhan211 and @alamb whenever you get chance please have a look, IMHO I find proposal with closure to tick all the boxes, but this one is definitely simpler.
After playing around, I feel like maybe optional closure is our answer 🤔
fn simplify(
&self,
_info: &dyn SimplifyInfo,
) -> Option<Box<dyn Fn(AggregateArgs) -> Expr>> {}
After playing around, I feel like maybe optional closure is our answer 🤔
Damn! I should have created a branch with that code 😀, if we have consensus on that direction I'll redo it
After playing around, I feel like maybe optional closure is our answer 🤔
Damn! I should have created a branch with that code 😀, if we have consensus on that direction I'll redo it
My final answer for today is
// UDAF
fn simplify(
&self,
args: AggregateArgs,
_info: &dyn SimplifyInfo,
) -> Option<Box<dyn Fn(AggregateArgs) -> Result<Expr>>> {
// UDF, not necessary but for consistency
fn simplify(
&self,
args: Vec<Expr>,
_info: &dyn SimplifyInfo,
) -> Option<Box<dyn Fn(Vec<Expr>) -> Result<Expr>>> {
Go with optional closure!
My final answer for today is
fn simplify( &self, args: AggregateArgs, _info: &dyn SimplifyInfo, ) -> Option<Box<dyn Fn(AggregateArgs) -> Result<Expr>>> {
Go with optional closure!
you mean
// UDF
fn simplify(
&self,
) -> Option<Box<dyn Fn(AggregateFunction, &dyn SimplifyInfo) -> Result<Expr>>> {
or
// UDF
fn simplify(
&self,
info: &dyn SimplifyInfo
) -> Option<Box<dyn Fn(AggregateFunction) -> Result<Expr>>> {
It should be this.
// UDF
fn simplify(
&self,
) -> Option<Box<dyn Fn(AggregateFunction, &dyn SimplifyInfo) -> Result<Expr>>> {
The reason for optional closure is that I assume we need to return Expr
for simplified cases. If we expect AggregateArgs
, then we might not need closure.
We're on the same page @jayzhan211
We're on the same page @jayzhan211
If we don't need Expr
for simplified UDAF, than we can have
pub fn simplify(
&self,
args: AggregateArgs,
info: &dyn SimplifyInfo,
) -> Result<Transformed<AggregateArgs>> {}
I think the next problem is whether we need Expr
or just AggregateArgs
. The former can let us return the UDAF with a different function. But for this kind of function rewrite, we can handle it in FuncitonWrite
. You can take ArrayFunctionRewriter
for example.
@milenkovicm
We're on the same page @jayzhan211
If we don't need
Expr
for simplified UDAF, than we can havepub fn simplify( &self, args: AggregateArgs, info: &dyn SimplifyInfo, ) -> Result<Transformed<AggregateArgs>> {}
I think the next problem is whether we need
Expr
or justAggregateArgs
. The former can let us return the UDAF with a different function. But for this kind of function rewrite, we can handle it inFuncitonWrite
. You can takeArrayFunctionRewriter
for example.@milenkovicm
I think we need Expr, this way we can replace function with something else
We're on the same page @jayzhan211
If we don't need
Expr
for simplified UDAF, than we can havepub fn simplify( &self, args: AggregateArgs, info: &dyn SimplifyInfo, ) -> Result<Transformed<AggregateArgs>> {}
I think the next problem is whether we need
Expr
or justAggregateArgs
. The former can let us return the UDAF with a different function. But for this kind of function rewrite, we can handle it inFuncitonWrite
. You can takeArrayFunctionRewriter
for example. @milenkovicmI think we need Expr, this way we can replace function with something else
I agree.
You can define closure in datafusion/expr/src/function.rs
Thanks again @jayzhan211 and @milenkovicm 🙏
Which issue does this PR close?
Closes #9526.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
yes, added new test and example
Are there any user-facing changes?
no, changes are backward compatible