dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.02k stars 1.63k forks source link

[Feature] Emit a more specific error when there are duplicate columns in a model with an enforced contract #10247

Open rchui opened 6 months ago

rchui commented 6 months ago

Is this a new bug in dbt-core?

Current Behavior

When you have two columns with the same name/type in the same model and that model has a contract enabled it produces this ambiguous contract error:

  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.

  | column_name | definition_type | contract_type | mismatch_reason |
  | ----------- | --------------- | ------------- | --------------- |

  > in macro assert_columns_equivalent (macros/relations/column/columns_spec_ddl.sql)
  > called by macro default__get_assert_columns_equivalent (macros/relations/column/columns_spec_ddl.sql)
  > called by macro get_assert_columns_equivalent (macros/relations/column/columns_spec_ddl.sql)
  > called by macro snowflake__create_table_as (macros/relations/table/create.sql)
  > called by macro create_table_as (macros/relations/table/create.sql)
  > called by macro statement (macros/etc/statement.sql)
  > called by macro materialization_incremental_snowflake (macros/materializations/incremental.sql)

Expected Behavior

I would expect it to raise an error pointing out that a duplicate column exists.

Steps To Reproduce

Create a model with two of the same name/type:

SELECT TRUE::BOOLEAN AS is_deleted, TRUE::BOOLEAN AS is_deleted

Create a schema with only one of the columns:

- name: model
  config: {contract: {enabled: true}}
  columns:
    - name: is_deleted
      data_type: boolean

Run the model dbt run --select <model>

Relevant log output

No response

Environment

- OS: Macos 14.4.1
- Python: 3.11
- dbt: 1.8.1

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

dbeatty10 commented 6 months ago

Thanks for reporting this @rchui !

I saw a similar error message across a variety of warehouses when I created a model and contract like you described. For compatibility while testing across different data warehouses, I used the date data type because it is the most commonly supported.

Here's the files I used as a reprex:

models/my_model.sql

select
    cast('2020-01-01' as date) as created_at,
    cast('2020-01-01' as date) as created_at

models/_models.yml

models:
  - name: my_model
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: created_at
        data_type: date

Interestingly, when I ran this model without enforcing a contract, all warehouses I tested gave an error similar to the following:

23:51:11    Database Error in model my_model (models/my_model.sql)
  002025 (42S21): SQL compilation error:
  duplicate column name 'CREATED_AT'
  compiled Code at target/run/my_project/models/my_model.sql

Summary

This doesn't look like a bug to me because the contract fails as desired instead of succeeding, and the error message includes a call-out to check the number of columns:

This model has an enforced contract that failed. Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.

Re-categorizing this as a feature request accordingly.

But agreed with your suggestion that it would be nice if dbt included an error message similar to one of these:

This isn't something that would be high priority for us so labeling as "help wanted" for anyone that wants to raise a pull request with a more precise error message.

github-actions[bot] commented 5 days ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.