dbt-labs / docs.getdbt.com

The code behind docs.getdbt.com
https://docs.getdbt.com/
Apache License 2.0
118 stars 940 forks source link

Include examples of `unique_key` as a list #4642

Open dbeatty10 opened 10 months ago

dbeatty10 commented 10 months ago

Contributions

Link to the page on docs.getdbt.com requiring updates

https://docs.getdbt.com/reference/resource-configs/unique_key

What part(s) of the page would you like to see updated?

unique_key is only discussed in the context of snapshots, but it is also a "key" component for incremental models!

Depending on the context, it can be a column name, a SQL expression (snapshots only), or a list (incremental models only). And for at least the delete+insert incremental strategy, it doesn't have to uniquely match to only a single row in a data set.

What happens when…

### Error messages for `merge` when `unique_key` is not unique #### dbt-postgres ``` 22:24:21 Database Error in model merge (models/merge.sql) MERGE command cannot affect row a second time HINT: Ensure that not more than one source row matches any one target row. ``` #### dbt-redshift ``` 22:24:38 Database Error in model merge (models/merge.sql) Found multiple matches to update the same tuple. ``` #### dbt-snowflake ``` 22:24:50 Database Error in model merge (models/merge.sql) 100090 (42P18): Duplicate row detected during DML action Row Values: [redacted] ``` #### dbt-bigquery ``` 00:21:20 Database Error in model merge (models/merge.sql) UPDATE/MERGE must match at most one source row for each target row ``` #### dbt-databricks ``` 22:25:16 Runtime Error in model merge (models/merge.sql) Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge, when multiple source rows match on the same target row, the result may be ambiguous as it is unclear which source row should be used to update or delete the matching target row. You can preprocess the source table to eliminate the possibility of multiple matches. Please refer to https://docs.databricks.com/delta/merge.html#merge-error ```
### Effects on snapshots when `unique_key` is not unique #### dbt-postgres dbt-postgres doesn't use a `merge` statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when `unique_key` is not unique. #### dbt-redshift dbt-redshift doesn't use a `merge` statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when `unique_key` is not unique. #### dbt-snowflake dbt-snowflake uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique: ``` 00:31:07 Database Error in snapshot snapshot (snapshots/snapshot.sql) 100090 (42P18): Duplicate row detected during DML action Row Values: [redacted] ``` #### dbt-bigquery dbt-bigquery uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique: ``` 00:31:30 Database Error in snapshot snapshot (snapshots/snapshot.sql) UPDATE/MERGE must match at most one source row for each target row ``` #### dbt-databricks dbt-databricks uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique: ``` 01:17:58 Runtime Error in snapshot snapshot (snapshots/snapshot.sql) Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge, when multiple source rows match on the same target row, the result may be ambiguous as it is unclear which source row should be used to update or delete the matching target row. You can preprocess the source table to eliminate the possibility of multiple matches. Please refer to https://docs.databricks.com/delta/merge.html#merge-error ```

Additional information

Here's a sample of pages that discuss the purpose and behavior of unique_key within either snapshots or incremental models:

Instigating context

By the way, I noticed that this list option is missing from the docs here: https://docs.getdbt.com/reference/resource-configs/unique_key#use-a-combination-of-two-columns-as-a-unique-key

_Originally posted by @cdabel in https://github.com/dbt-labs/dbt-core/pull/4618#discussion_r959860372_

Discovered while researching https://github.com/dbt-labs/dbt-adapters/issues/150.

runleonarun commented 9 months ago

but it is also a "key" component for incremental models! 😆