Open sunng87 opened 1 week ago
@sunng87 We are going to remove the wrapper layer of our UDF/UDAF and use datafusion's UDF API in the future. Not sure if this issue can benefit from it.
@evenyag If we use datafusion's API, are we still using our own Vector
as input?
@evenyag If we use datafusion's API, are we still using our own
Vector
as input?
No, we process arrow's arrays directly. Writing a simple UDF should be easy. https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/simple_udf.rs
What type of enhancement is this?
Refactor
What does the enhancement do?
The idea is to create a high level framework for UDF development (not UDAF), to remove boilerplate code, and improve ergonomic.
The core responsibility of this framework is to provide:
Current status
At the moment, a typical implementation of UDF looks like this one: https://github.com/GreptimeTeam/greptimedb/blob/main/src/common/function/src/scalars/geo/h3.rs#L95
Basically we do following steps to generate the result vector:
Desired state
Because every implementation has do these 1/2/3/5 steps. An ergonomic solution is to provide a declarative way to extract rust data types from column vectors, and the user simply focus on calling rust function. The implementation of UDF should be stateless, so until we have a real case, we don't need to provide any type of context for execution except the original
FunctionContext
.Inspired by how axum designed its web handler. The API looks like
FunctionExtN
will provide default implementation forFunction::eval
.TODO: think about how to detail with
R
Limitation
Documentation
Procedural macro is preferred in this case for two types of usage:
Implementation challenges
No response