dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
850 stars 162 forks source link

Run unit test on query with pre operation #1472

Open AnOtterGithubUser opened 1 year ago

AnOtterGithubUser commented 1 year ago

Hello, I have a query of type table with a pre_operations block that looks like that:

pre_operations {
    DECLARE test_date DEFAULT (
        ${when(
            dataform.projectConfig.vars.test_date==="", 
            `2020-01-01`, 
            `DATE('dataform.projectConfig.vars.test_date')`
            )
        }
    );
}

The variable is then used in the query, ex: SELECT * FROM table WHERE date >= test_date
Basically, I want my users to be able to provide the variable test_date as a compilation variable in command line, so they don't have to modify the code directly for their experiments. By default the variable is "" in dataform.json.
I have a unit test for the query but it fails with the following message:
Error thrown: Unrecognized name: test_date at [20:20]
It seems that the test does not run the pre_operations block. Is there any way to change this behaviour ? If not, any way to declare the variable test_date directly in the test ?
Thanks 🙏
PS: I saw that post: https://towardsdatascience.com/unit-tests-for-sql-scripts-with-dependencies-in-dataform-847133b803b7
But it seems unnecessarily complex for a small test like that. Plus it does not use a test type and has to be run with a specific tag (so no point in the dataform test command anymore)

anirvanbasu commented 1 year ago

Try doing this:

In dataform.js, declare the custom variable as:

"vars" : { "myDateVar" : "2020-01-01" }

Then, modify your test SQL like:

SELECT * FROM table WHERE date >= dataform.projectConfig.vars.myDateVar

ttogola commented 1 year ago

Also ran into this, and @anirvanbasu 's suggestion is not sufficient. In my case I'm using pre_operations to define a temporary sql udf that is used in my table query.

Since the query depends on the udf being defined, the unit test fails because pre_operations is not run.

Ekrekr commented 7 months ago

Hi all,

I'm trying to understand why the Dataform variables can't be passed directly to the call sites, rather than declared in pre_operations or other ways in BQ state using UDFs.

ttogola commented 7 months ago

@Ekrekr as mentioned my case is defining a UDF in the pre_operations to be used by the main query. A UDF can't be replaced by config vars.

See this example from the gcp docs https://cloud.google.com/dataform/docs/table-settings#execute-sql-before-table

Trying to test a task that uses such a config will fail.