apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[VL] Remove native udf registration #7015

Open marin-ma opened 2 months ago

marin-ma commented 2 months ago

Description

Currently, the loaded native udf will be registered into Spark's FunctionRegistry to make the udf pass SQL analysis. However, the registration is opaque to the users, also it's not the way to use Java UDF in spark. The native udf framework should be improved to only allow the native udf usable by explicitly registering the function through CREATE TEMPORARY FUNCTION. This approach not only expose the udf registration process to users, it can also guarantee there's always an available Java version UDF in case of fallback.

At runtime, native UDFs should be registered alongside their Java implementations via CREATE TEMPORARY FUNCTION. Once registered, Gluten can parse and offload these UDFs to Velox during execution, meanwhile ensuring proper fallback to Java UDFs when necessary.

This is a breaking change.

marin-ma commented 2 months ago

This change may affect your existing workloads. @kecookier

FelixYBW commented 2 months ago

@kecookier did you use Hive UDFs or Spark UDFs? for Hive UDFs, the SQL must have CREATE TEMPORARY FUNCTION statement. The function will be registered there. So we needn't register it in native.

To Spark functions, we still don't have a good way to catch it.

kecookier commented 2 months ago

@FelixYBW @marin-ma Thanks, we use Hive UDFS, and have CREATE TEMPORARY FUNCTION in our ETL.

kecookier commented 2 months ago

cc @NEUpanning