dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.52k stars 1.58k forks source link

[Bug] Duplicate row detected during DML action #10432

Open edwrand opened 1 month ago

edwrand commented 1 month ago

Is this a new bug in dbt-core?

Current Behavior

I have an incremental model pulling from a collection of ephemeral models. I have defined a unique key (surrogate key of several columns). Using the MERGE strategy on the unique key I have defined (I have confirmed this key is unique), I get the error Duplicate row detected during DML action.

This confuses me because I have confirmed my key is unique and this how are these duplicate rows when MERGing?

Expected Behavior

I expected my MERGE strategy to incrementally merge new records based on my unique surrogate key. Instead this model fails with the error "Duplicate row detected during DML action"

Steps To Reproduce

  1. I am not sure how to instruct someone to replicate this as we have a 1,000+ model warehouse

Relevant log output

18:02:10
18:02:10  Completed with 1 error and 0 warnings:
18:02:10  Database Error in model fact_loan_histories (models/core_data_warehouse/fact_loan_histories/fact_loan_histories.sql)
18:02:10    100090 (42P18): Duplicate row detected during DML action
18:02:10    Row Values: ["e250a04d82a2ca0f5f7c37f8d0dd34b4", 2, 4, "b3bc5b1759a83c996cfa3e568ac2734c", "6fa2c7f681cc33eafa32c3caa8bf086a", 1658817, "e08dde133e154d7b7be707b407d917d1", 8, 20240709, "9b837f2b7ef9ce3dfa6121bafabceff2", 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19000101, 0, 0, 0, 0, 0, 0, 20250314, 20240605, 0, 56, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]

Environment

- OS: MacOS 13.4.1 (c) 
- Python: using DBT cloud
- dbt:1.3

Which database adapter are you using with dbt?

snowflake

Additional Context

dbt-error-logs.odt

dbeatty10 commented 1 month ago

Thanks for reaching out @edwrand !

Did you confirm that the unique_key has unique values for both of the following?

  1. The existing data in the table for the incremental model
  2. The new data that is to be added during the next incremental run
edwrand commented 1 month ago

yes both places are unique but I still see this issue

dbeatty10 commented 1 month ago

@edwrand Unfortunately, I am not able to reproduce this error without seeing your data and the details of your incremental model.

If you're able to share a simple model and example data that produces this error, that would be ideal. But barring that, I'd suggest that you find the SQL that is causing the "Duplicate row detected during DML action" and trying to troubleshoot it in the Snowflake UI.

Here's how you can find that SQL:

Search within your target directory for a file named like this:

target/run/YOUR_PROJECT_NAME/models/.../fact_loan_histories.sql

You'll see some code that generally looks like this:

    merge into analytics_dev.dbt_dbeatty.fact_loan_histories as DBT_INTERNAL_DEST
        using analytics_dev.dbt_dbeatty.fact_loan_histories__dbt_tmp as DBT_INTERNAL_SOURCE
        on (
                DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
            )

    when matched then update set
        "ID" = DBT_INTERNAL_SOURCE."ID","UPDATED_AT" = DBT_INTERNAL_SOURCE."UPDATED_AT"

    when not matched then insert
        ("ID", "UPDATED_AT")
    values
        ("ID", "UPDATED_AT")

;
    commit;

Then you can try running that SQL in the Snowflake UI. If it gives the "Duplicate row detected during DML action", then you can try to determine the source of the error. There should only be two possibilities:

  1. There's a duplicate in the data somewhere
  2. There's a bug in the SQL logic

Let us know what you find out!