dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
214 stars 149 forks source link

[CT-1631] [Bug] dbt snapshot fails on schema change in nested fields #383

Open dashnak90 opened 1 year ago

dashnak90 commented 1 year ago

Is this a new bug in dbt-bigquery?

Current Behavior

I get datatype error when trying to run snapshot on a table, where there was a schema change in a nested field(column added/removed) since the last snapshot run.

Value has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> which cannot be inserted into column created_by, which has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> at [16:33]

Attached on the left is the updated schema of the tmp table to be merged, on the right is the original table Screenshot 2022-11-05 at 08 51 56 (1)

Expected Behavior

Snapshot table is updated successfully and includes new changes in schema of the nested fields

Steps To Reproduce

  1. Create initial snapshot of a table that contains nested column
  2. Add/remove one more column to/from the nested column of the table
  3. Run dbt snapshot

Relevant log output

`Value has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> which cannot be inserted into column created_by, which has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> at [16:33]`

Environment

- OS:macOS-12.6-arm64-arm-64bit
- Python:3.9.12
- dbt-core:1.3.0
- dbt-bigquery:1.3.0

Additional Context

No response

Fleid commented 1 year ago

I can reproduce, but I don't think that's a bug.

BigQuery doesn't support schema mutations on record fields.

From a user standpoint it looks like the field is of type STRUCT, but from the database standpoint the type is actually STRUCT<module STRING, reference_entity_class STRING, reference_table STRING> (let's focus on your first 3 properties). If you want to insert a new value of type STRUCT<module STRING, reference_entity_class STRING> (missing that last field), for BigQuery it's as invalid as trying to insert a string into an integer column.

I know records come with the expectation of being schema-on-read, but that's not the case for BigQuery.

Since that's the exact role of snapshots in dbt, dealing with mutations, I'm not against adding that to the feature set. I'll re-classify as enhancement. I don't think we'll be able to prioritize in the short term, so I'll flag it as help_wanted.

Thanks a lot for this issue @dashnak90

HansalShah007 commented 1 year ago

Has there been an update in dbt snapshots for solving this issue? @Fleid

Fleid commented 1 year ago

@HansalShah007 the latest is my comment just above :)

The core issue is that BigQuery doesn't accept changes in nested fields. We won't be able to add a workaround in dbt in the short term, but definitely accepting contributions on the topic!