databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
197 stars 105 forks source link

Feature request: Support Python UDF From DBT SQL Model #603

Open case-k-git opened 4 months ago

case-k-git commented 4 months ago

Describe the feature

A clear and concise description of what you want to happen.

Using Spark UDF From DBT will be helpful.

As discussing in dbt-spark, Something like using pre_hook will be helpfull. https://github.com/dbt-labs/dbt-spark/issues/135#issuecomment-852920532

{{ config( 
pre_hook=['
function custom_df(input)
   # do some logic 
   return output

spark.udf.register('custom_df', custom_df)
']
) }}

select custom_df( x ) from {{ ref('my_table') }}

or

{{ config( 
pre_hook=['dbfs:/scripts/init_functions.py']
) }}

select custom_df( x ) from {{ ref('my_table') }}

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Using DBT Python models. Ideally want to use udf from DBT SQL model https://docs.getdbt.com/docs/build/python-models

(Advanced) Use dbt Python models in a workflow

https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-use-dbt-python-models-in-a-workflow

Additional context

Please include any other relevant context here.

Running dbt in production with Python UDFs https://www.explorium.ai/blog/news-and-updates/running-dbt-production-python-udfs/

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.

Some one who want to use UDF.In our company proceed migration from Oracle PL/SQL to Databricks. If we can use udf some function will be easy to migrate. https://www.databricks.com/blog/how-migrate-your-oracle-plsql-code-databricks-lakehouse-platform

Are you interested in contributing this feature?

Let us know if you want to write some code, and how we can help.

Yes

case-k-git commented 4 months ago

Ah may be can we use databricks udf from macro?

Create UDF ※ we also able to register out side from dbt https://docs.databricks.com/en/udf/unity-catalog.html#register-a-python-udf-to-uc

{% macro greet(a) %}
  CREATE FUNCTION target_catalog.target_schema.greet(s STRING)
  RETURNS STRING
  LANGUAGE PYTHON
  AS $$
    return f"Hello, {s}"
  $$
{% endmacro %}

Read UDF Add macro function to call udf function.

{% macro get_ greet(a) %}
  `catalog.schema. greet `({{ a }})
{% endmacro %}

Use UDF use udf crom dbt sql

SELECT
  {{
    get_ greet(
      a="Jone"
    )
  }} AS amount
FROM {{ ref('table') }}
Nintorac commented 2 months ago

This might already be possible though I'm not sure

Here is how I've done it with DuckDB

I define a plugin here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7/udf/__init__.py and then load it in the profile here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7_dbt/profiles.yml#L13 And finally can use it eg here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7_dbt/models/dx7_voices.sql#L3C25-L3C39

what I'm not clear on is