Open marin-ma opened 2 months ago
This change may affect your existing workloads. @kecookier
@kecookier did you use Hive UDFs or Spark UDFs? for Hive UDFs, the SQL must have CREATE TEMPORARY FUNCTION
statement. The function will be registered there. So we needn't register it in native.
To Spark functions, we still don't have a good way to catch it.
@FelixYBW @marin-ma Thanks, we use Hive UDFS, and have CREATE TEMPORARY FUNCTION
in our ETL.
cc @NEUpanning
Description
Currently, the loaded native udf will be registered into Spark's FunctionRegistry to make the udf pass SQL analysis. However, the registration is opaque to the users, also it's not the way to use Java UDF in spark. The native udf framework should be improved to only allow the native udf usable by explicitly registering the function through
CREATE TEMPORARY FUNCTION
. This approach not only expose the udf registration process to users, it can also guarantee there's always an available Java version UDF in case of fallback.At runtime, native UDFs should be registered alongside their Java implementations via
CREATE TEMPORARY FUNCTION
. Once registered, Gluten can parse and offload these UDFs to Velox during execution, meanwhile ensuring proper fallback to Java UDFs when necessary.This is a breaking change.