What part(s) of the page would you like to see updated?
unique_key is only discussed in the context of snapshots, but it is also a "key" component for incremental models!
Depending on the context, it can be a column name, a SQL expression (snapshots only), or a list (incremental models only). And for at least the delete+insert incremental strategy, it doesn't have to uniquely match to only a single row in a data set.
What happens when…
unique_key is a column name?
✅ Snapshots
✅ Incremental models
unique_key is an expression?
✅ Snapshots
❌ Incremental models (error)
unique_key is a list of column names?
❌ Snapshots (error)
✅ Incremental models
unique_key is not actually unique?
❌ Snapshots (bad data)
Incremental models
✅ append
✅ delete_insert
✅ insert_overwrite
❌ merge (error)
### Error messages for `merge` when `unique_key` is not unique
#### dbt-postgres
```
22:24:21 Database Error in model merge (models/merge.sql)
MERGE command cannot affect row a second time
HINT: Ensure that not more than one source row matches any one target row.
```
#### dbt-redshift
```
22:24:38 Database Error in model merge (models/merge.sql)
Found multiple matches to update the same tuple.
```
#### dbt-snowflake
```
22:24:50 Database Error in model merge (models/merge.sql)
100090 (42P18): Duplicate row detected during DML action
Row Values: [redacted]
```
#### dbt-bigquery
```
00:21:20 Database Error in model merge (models/merge.sql)
UPDATE/MERGE must match at most one source row for each target row
```
#### dbt-databricks
```
22:25:16 Runtime Error in model merge (models/merge.sql)
Cannot perform Merge as multiple source rows matched and attempted to modify the same
target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge,
when multiple source rows match on the same target row, the result may be ambiguous
as it is unclear which source row should be used to update or delete the matching
target row. You can preprocess the source table to eliminate the possibility of
multiple matches. Please refer to
https://docs.databricks.com/delta/merge.html#merge-error
```
### Effects on snapshots when `unique_key` is not unique
#### dbt-postgres
dbt-postgres doesn't use a `merge` statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when `unique_key` is not unique.
#### dbt-redshift
dbt-redshift doesn't use a `merge` statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when `unique_key` is not unique.
#### dbt-snowflake
dbt-snowflake uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique:
```
00:31:07 Database Error in snapshot snapshot (snapshots/snapshot.sql)
100090 (42P18): Duplicate row detected during DML action
Row Values: [redacted]
```
#### dbt-bigquery
dbt-bigquery uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique:
```
00:31:30 Database Error in snapshot snapshot (snapshots/snapshot.sql)
UPDATE/MERGE must match at most one source row for each target row
```
#### dbt-databricks
dbt-databricks uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique:
```
01:17:58 Runtime Error in snapshot snapshot (snapshots/snapshot.sql)
Cannot perform Merge as multiple source rows matched and attempted to modify the same
target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge,
when multiple source rows match on the same target row, the result may be ambiguous
as it is unclear which source row should be used to update or delete the matching
target row. You can preprocess the source table to eliminate the possibility of
multiple matches. Please refer to
https://docs.databricks.com/delta/merge.html#merge-error
```
Additional information
Here's a sample of pages that discuss the purpose and behavior of unique_key within either snapshots or incremental models:
Contributions
Link to the page on docs.getdbt.com requiring updates
https://docs.getdbt.com/reference/resource-configs/unique_key
What part(s) of the page would you like to see updated?
unique_key
is only discussed in the context of snapshots, but it is also a "key" component forincremental
models!Depending on the context, it can be a column name, a SQL expression (snapshots only), or a list (incremental models only). And for at least the
delete+insert
incremental strategy, it doesn't have to uniquely match to only a single row in a data set.What happens when…
unique_key
is a column name?unique_key
is an expression?unique_key
is a list of column names?unique_key
is not actually unique?### Error messages for `merge` when `unique_key` is not unique
#### dbt-postgres ``` 22:24:21 Database Error in model merge (models/merge.sql) MERGE command cannot affect row a second time HINT: Ensure that not more than one source row matches any one target row. ``` #### dbt-redshift ``` 22:24:38 Database Error in model merge (models/merge.sql) Found multiple matches to update the same tuple. ``` #### dbt-snowflake ``` 22:24:50 Database Error in model merge (models/merge.sql) 100090 (42P18): Duplicate row detected during DML action Row Values: [redacted] ``` #### dbt-bigquery ``` 00:21:20 Database Error in model merge (models/merge.sql) UPDATE/MERGE must match at most one source row for each target row ``` #### dbt-databricks ``` 22:25:16 Runtime Error in model merge (models/merge.sql) Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge, when multiple source rows match on the same target row, the result may be ambiguous as it is unclear which source row should be used to update or delete the matching target row. You can preprocess the source table to eliminate the possibility of multiple matches. Please refer to https://docs.databricks.com/delta/merge.html#merge-error ```### Effects on snapshots when `unique_key` is not unique
#### dbt-postgres dbt-postgres doesn't use a `merge` statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when `unique_key` is not unique. #### dbt-redshift dbt-redshift doesn't use a `merge` statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when `unique_key` is not unique. #### dbt-snowflake dbt-snowflake uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique: ``` 00:31:07 Database Error in snapshot snapshot (snapshots/snapshot.sql) 100090 (42P18): Duplicate row detected during DML action Row Values: [redacted] ``` #### dbt-bigquery dbt-bigquery uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique: ``` 00:31:30 Database Error in snapshot snapshot (snapshots/snapshot.sql) UPDATE/MERGE must match at most one source row for each target row ``` #### dbt-databricks dbt-databricks uses a `merge` statement for snapshots, so it fails when `unique_key` is not unique: ``` 01:17:58 Runtime Error in snapshot snapshot (snapshots/snapshot.sql) Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge, when multiple source rows match on the same target row, the result may be ambiguous as it is unclear which source row should be used to update or delete the matching target row. You can preprocess the source table to eliminate the possibility of multiple matches. Please refer to https://docs.databricks.com/delta/merge.html#merge-error ```Additional information
Here's a sample of pages that discuss the purpose and behavior of
unique_key
within either snapshots or incremental models:Instigating context
_Originally posted by @cdabel in https://github.com/dbt-labs/dbt-core/pull/4618#discussion_r959860372_
Discovered while researching https://github.com/dbt-labs/dbt-adapters/issues/150.