dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
834 stars 161 forks source link

Using Ref and schema variables inside a SQLX template #1611

Open vcetinick opened 9 months ago

vcetinick commented 9 months ago

I am trying to make the schema configuration more dynamic by using "vars" in the dataform.json property as a way to control which schema to use based on the value of the variable. Goal is to have to the definition of schema set depending on when environment being used at execution time.

The SQLX template looks like this

config {
  schema: dataform.projectConfig.vars.env_schema,
  type: "view"
}

select *  from ${ref("users_sys_1")} u1
union all 
select *  from ${ref("users_sys_2")} u2

This template is simply trying union two tables together (users_sys_1 and users_sys_2), however, there seems to be something funny happening in the resolution for the ref command because the compiled query becomes

select *  from `wh-proj1.${dataform.projectConfig.vars.env_schema}.users_sys_1` u1
union all 
select *  from `wh-proj1.${dataform.projectConfig.vars.env_schema}.users_sys_2` u2

I would expect the value to be interpolated from the config.schema block, but instead it seems to have bled through. E.g.

select *  from `wh-proj1.ds_staging.users_sys_1` u1
union all 
select *  from `wh-proj1.ds_staging.users_sys_2` u2

Because the interpolation is not happening, I get the error

Invalid project ID 'wh-proj1.${dataform.projectConfig.vars'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash..

Is this correct behaviour?

vcetinick commented 9 months ago

In addition, I have noticed that if I include the schema in the ref function, I get the desired behaviour. e.g.

config {
  schema: dataform.projectConfig.vars.env_schema,
  type: "view"
}

select *  from ${ref(dataform.projectConfig.vars.env_schema, "users_sys_1")} u1
union all 
select *  from ${ref(dataform.projectConfig.vars.env_schema, "users_sys_2")} u2

Now the SQL is interpolated properly:

select *  from `wh-proj1.ds_staging.users_sys_1` u1
union all 
select *  from `wh-proj1.ds_staging.users_sys_2` u2
BenBirt commented 9 months ago

Interesting! This is definitely not correct/expected. We will look into it.

vilcajoal commented 2 months ago

Hi @BenBirt and @vcetinick I ran into this problem in my sqlx code I do allow myself to interpolate using "vars" in

CREATE OR REPLACE TABLE ${dataform.projectConfig.defaultDatabase}.${dataform.projectConfig.vars.dataset_test}.dim_categories

But it doesn't work for me in ref()

${ref(dataform.projectConfig.vars.dataset_raw, "raw_categories")}

I get Undefined properties cannot be read (reading 'get') in the data form compiler.

Is there any news regarding this topic?