dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
215 stars 149 forks source link

[Feature] Implicitly cast nested `STRUCT` properties during unit test fixture generation #1184

Open edudosaara opened 5 months ago

edudosaara commented 5 months ago

Is this a new bug in dbt-bigquery?

Current Behavior

Unit testing models is fine at row level. But when dbt tries to create a fixture for a nested struct, it does not safe cast all the missing properties. Even the nullable properties of any struct need to be inserted manually at the test in order to make it run without errors.

Expected Behavior

When compiling unit tests, dbt's fixture generation should create safe casted nulls recursively for every missing property of nested structs.

Steps To Reproduce

  1. Create this GBQ table:
    CREATE TABLE gbq_dataset.example_table(
    id INT NOT NULL,
    element_name VARCHAR(255),
    content ARRAY<STRUCT<name VARCHAR(255), description VARCHAR(255), date TIMESTAMP>>
    PRIMARY KEY (id)
    );
  2. Create this simple dbt model:
    
    {{
    config(materialized='table')
    }}

SELECT * FROM {{ source('gbq_dataset', 'example_table') }}

3. Create this unit_test.yml:
```yml
unit_tests:
  - name: example_model_refresh
    model: example_model
    given:
      - input: source('gbq_dataset', 'example_table')
        rows:
          - {
               id: 1,
               element_name: 'test',
               content: ['struct("John" as name)']
            }
    expect:
      rows:
          - {
               id: 1,
               element_name: 'test',
               content: ['struct("John" as name)']
            }
  1. The resulting compiled SQL will be like this:

    
    /* {"app": "dbt", "dbt_version": "1.8.0b1", "profile_name": "gbq_project", "target_name": "dev", "node_id": "unit_test.gbq_project.example.example_model_refresh"} */
    
    create or replace table `gbq_project`.`gbq_dataset`.`example_model_refresh__dbt_tmp`
    
    OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
    )
    as (
      select * from (

with dbtcte__example_table as (

-- Fixture for example_table select safe_cast(1 as NUMERIC) as id, safe_cast('test' as STRING) as element_name, (select array_agg(safe_cast(i as struct<name string, description string, date timestamp>)) from unnest([ struct("John" as name) ]) i) as content ) SELECT * FROM dbtcte__example_table ) as __dbt_sbq where false and current_timestamp() = current_timestamp() limit 0

);
5. The error below is shown:
`Invalid cast from STRUCT<name STRING> to STRUCT<name STRING, description STRING, date TIMESTAMP> at [23:37]`

### Relevant log output

_No response_

### Environment

```markdown
- OS: macOS Sonoma 14.4.1
- Python: 3.12.2
- dbt-core: v1.8.0-b1
- dbt-bigquery: v1.8.0-b2

Additional Context

No response

dbeatty10 commented 5 months ago

Thanks for reporting this @edudosaara !

After getting insights from @MichelleArk, we believe we should probably document this as a known limitation, and I've opened https://github.com/dbt-labs/docs.getdbt.com/issues/5307 as a result.

All fields in a BigQuery struct need to be specified in a unit test -- it's not currently possible to use only a subset of columns in a struct.

But we agree that we'd like to support this.

To support this, it would probably take writing some struct-specific handling of missing fields in the safe_cast method.

Workaround

Include all the fields in a BigQuery struct within the unit test.