dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
852 stars 163 forks source link

Support for BigQuery User-Defined Functions (UDFs) #1289

Open dirkjonker opened 2 years ago

dirkjonker commented 2 years ago

It would be useful to be able to create User-Defined Functions (UDF) from dataform, and being able to reference them like you reference a table. Right now I create UDFs in the pre_operations section, but I don't have syntax highlighting or ability to test it. Also calling the UDFs from a query is a bit hacky now, as you need to fully qualify the function name (project_name.dataset.function_name) in order to ensure you are calling the right function.

Bonus for being able to unit-test Javascript UDFs!

lewish commented 2 years ago

You can kind of do this already:

// my_udf.sqlx
config {
  type: "operations",
  hasOutput: true,
}

create function ${self()}(x INT64) ...

And then you can reference it:

// my_table.sqlx
config { type: "table" }
select ${ref("my_udf")}(column) from ...

This is a bit of a workaround, but it does mean the UDFs are part of the graph, by storing them as permanent functions inside BigQuery.

This doesn't address your points on unit-testing however!

dirkjonker commented 2 years ago

@lewish thanks, that's really useful!

Now I'm just missing the syntax highlighting and automatic formatting that respects the Javascript in the UDF string ;)

dirkjonker commented 2 years ago

I do get query validation warnings: "Some dependencies don't yet exist in the warehouse". It seems like Dataform doesn't know that the functions exist, even though it can successfully create them. It's just a warning though, so it doesn't block any dependent resources from being created, but it's a false warning which is inconvenient.

sanimesa commented 1 year ago

Just tried this, but dataform did not recognize the function until it was executed and persisted in the database.

davidsr2r commented 1 year ago

I was looking into this as a data engineer at a company using dbt, who helped develop our dbt tooling and maintain it; this would be a somewhat compelling reason to make a switch from dbt, since dbt also lacks this feature. We use UDFs to share functionality across models. If dataform had first class UDF support and a proper dry run feature (dry running incremental models, for example) it'd be an easy sell to make the switch.

Ekrekr commented 8 months ago

To expand on this, I would love to see some examples of where the functionality of Javascript UDFs can't be achieved without them.

Having logic computed at runtime in BigQuery removes a lot of the benefits of the determinism of a compiled Dataform graph - so I'm trying to understand the motivations of them better.