Open dirkjonker opened 2 years ago
You can kind of do this already:
// my_udf.sqlx
config {
type: "operations",
hasOutput: true,
}
create function ${self()}(x INT64) ...
And then you can reference it:
// my_table.sqlx
config { type: "table" }
select ${ref("my_udf")}(column) from ...
This is a bit of a workaround, but it does mean the UDFs are part of the graph, by storing them as permanent functions inside BigQuery.
This doesn't address your points on unit-testing however!
@lewish thanks, that's really useful!
Now I'm just missing the syntax highlighting and automatic formatting that respects the Javascript in the UDF string ;)
I do get query validation warnings: "Some dependencies don't yet exist in the warehouse". It seems like Dataform doesn't know that the functions exist, even though it can successfully create them. It's just a warning though, so it doesn't block any dependent resources from being created, but it's a false warning which is inconvenient.
Just tried this, but dataform did not recognize the function until it was executed and persisted in the database.
I was looking into this as a data engineer at a company using dbt, who helped develop our dbt tooling and maintain it; this would be a somewhat compelling reason to make a switch from dbt, since dbt also lacks this feature. We use UDFs to share functionality across models. If dataform had first class UDF support and a proper dry run feature (dry running incremental models, for example) it'd be an easy sell to make the switch.
To expand on this, I would love to see some examples of where the functionality of Javascript UDFs can't be achieved without them.
Having logic computed at runtime in BigQuery removes a lot of the benefits of the determinism of a compiled Dataform graph - so I'm trying to understand the motivations of them better.
It would be useful to be able to create User-Defined Functions (UDF) from dataform, and being able to reference them like you reference a table. Right now I create UDFs in the
pre_operations
section, but I don't have syntax highlighting or ability to test it. Also calling the UDFs from a query is a bit hacky now, as you need to fully qualify the function name (project_name.dataset.function_name
) in order to ensure you are calling the right function.Bonus for being able to unit-test Javascript UDFs!