dbt-labs / dbt-adapters

Apache License 2.0
22 stars 33 forks source link

[CT-3362] [Bug] dbt-postgres unable to perform --full-refresh on incremental model with primary key constraint #224

Open mcooper-pi opened 10 months ago

mcooper-pi commented 10 months ago

Is this a new bug in dbt-core?

Current Behavior

When running the command dbt-postgres build --full-refresh --select <model name>.sql I get the following error:

Database Error in model <model name> (<model name> .sql)
  relation "<primary key constraint name>" already exists

Upon inspecting the SQL code in dbt.log, I see that the --full-refresh command is building an exact replica of my table with the suffix __dbt_tmp. The error appears to stem from this table creating using the exact constraint name as my existing table. Postgres requires all constraint names to be unique regardless of the table it exists on, so this replica table does not create and throws the error.

Expected Behavior

I would expect the incremental table to be re-populated with data as if is_incremental() returns False instead of True.

Steps To Reproduce

  1. In postgres, create an incremental model and define a named primary key constraint in the models.yaml config file.
  2. Run dbt-postgres build and build the model
  3. Run dbt-postgres build --full-refresh
  4. See error Database Error in model <model name> (<model name> .sql) relation "<primary key constraint name>" already exists

Relevant log output

(Names and database objects have been redacted, but everything else is taken directly from the logs)

[0m11:40:47.791173 [debug] [Thread-1 (]: On model.project.model_name: /* {"app": "dbt", "dbt_version": "1.5.2", "profile_name": "meltano", "target_name": "staging", "node_id": "model.project.model_name"} */

  create  table "database"."schema"."int__table_name__dbt_tmp"

  (
    table_pk bigint not null,
    additional columns...

    constraint pk_int__table_name primary key (table_pk)
    )
 ;
    insert into "database"."schema"."int__table_name__dbt_tmp" (
      table_pk, additional columns...
    )

  (

    select table_pk, additional columns...
    from (
        model logic here

    ) as model_subq
  );

11:40:47.903708 [debug] [Thread-1 (]: Postgres adapter: Postgres error: relation "pk_int__table_name" already exists

11:40:47.908379 [debug] [Thread-1 (]: On model.project.model_name: ROLLBACK
11:40:48.021542 [debug] [Thread-1 (]: Timing info for model.project.model_name (execute): 11:40:46.644505 => 11:40:48.020919
11:40:48.025104 [debug] [Thread-1 (]: On model.project.model_name: Close
11:40:48.038370 [debug] [Thread-1 (]: Database Error in model model_name (models/intermediate/payment_transactions/model_name.sql)
  relation "pk_int__table_name" already exists
  compiled Code at /dir/model_name.sql
11:40:48.040750 [error] [Thread-1 (]: 1 of 33 ERROR creating sql incremental model paw_crucible.model_name  [ERROR in 1.43s]
11:40:48.044176 [debug] [Thread-1 (]: Finished running node model.project.model_name

Environment

- OS: MacOS 13.2.1
- Python: 3.11.3
- dbt: 1.5.2 (dbt-postgres)

Which database adapter are you using with dbt?

postgres

Additional Context

No response

dbeatty10 commented 10 months ago

Thanks for raising this @mcooper-pi.

Could you share the code that you are using to configure your primary key constraint?

I gave it a quick shot, and I didn't get the same SQL or error as you. See below for details.

My project

models/my_model.sql

{{ config(materialized="table") }}

select 1 as id

models/_models.yml

models:
  - name: my_model
    config:
      contract:
        enforced: true
    constraints:
      - type: primary_key
        columns: ["id"]
    columns:
      - name: id
        data_type: integer

or alternatively:

models/_models.yml

models:
  - name: my_model
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: integer

Commands

dbt build
dbt build --full-refresh

Output


  create  table "postgres"."dbt_dbeatty"."my_model__dbt_tmp"

  (
    id integer primary key

    )
 ;
    insert into "postgres"."dbt_dbeatty"."my_model__dbt_tmp" (
      id
    )

  (

    select id
    from (

select 1 as id
    ) as model_subq
  );

alter table "postgres"."dbt_dbeatty"."my_model" rename to "my_model__dbt_backup"
alter table "postgres"."dbt_dbeatty"."my_model__dbt_tmp" rename to "my_model"
mcooper-pi commented 10 months ago

Taken from my models/_models.yml file (object names changed):

models:
  - name: int__incremental_model
    description: model desc here
    config:
      contract:
        enforced: true
      materialized: incremental
      unique_key: unq_column
      on_schema_change: append_new_columns
    constraints:
      - type: primary_key
        columns:
          - column_pk
        name: pk_constraint_name
    columns:
      - name: column_pk
        data_type: bigint
        constraints:
          - type: not_null
        tests:
          - not_null
          - unique

I think the only material difference is that I am setting a specific name for the constraint.

dbeatty10 commented 10 months ago

How does it behave if you remove name: pk_constraint_name?

mcooper-pi commented 10 months ago

That appears to do the trick. That model also has a named unique constraint and a named check constraint, and when running it after removing the primary key constraint name, it errors out on the unique constraint name. It's interesting though - if I remove both name properties for the primary key and unique constraints, it builds successfully. The behavior does not persist with the check name.

Looking at the database table in DataGrip, the primary key and unique constraints are both categorized as keys, and the check constraint is just a check. I have to assume the check doesn't have the same uniqueness restriction as the keys do.

dbeatty10 commented 10 months ago

Thanks for checking that @mcooper-pi

Ideally, the error message wouldn't be this:

21:31:55  Database Error in model my_model (models/my_model.sql)
21:31:55    relation "pk_constraint_abc" already exists

But would be something like this instead:

21:31:55  Database Error in model my_model (models/my_model.sql)
21:31:55    Constraint with name "pk_constraint_abc" already exists. Consider removing the `name` property from this constraint to avoid this error.

Workaround

Remove any human-friendly constraint names from primary keys definitions that look like this:

    constraints:
      - type: primary_key
        columns: [<first_column>, <second_column>, ...]
        name: human_friendly_name

So this instead:

    constraints:
      - type: primary_key
        columns: [<first_column>, <second_column>, ...]
        # Removed the line below to prevent Database Error in model my_model (models/my_model.sql)
        # name: human_friendly_name
### Reprex #### Model definition `models/my_model.sql` ```sql {{ config(materialized="table") }} select 1 as id ``` #### YAML ```yaml models: - name: my_model config: contract: enforced: true constraints: - type: primary_key columns: - id name: pk_constraint_abc columns: - name: id data_type: integer ``` #### Commands ```shell dbt build dbt build --full-refresh ``` #### Output ``` $ dbt build --full-refresh 21:31:51 Running with dbt=1.5.2 21:31:51 Registered adapter: postgres=1.5.2 21:31:52 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 307 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups 21:31:52 21:31:52 Concurrency: 5 threads (target='postgres') 21:31:52 21:31:52 1 of 1 START sql table model dbt_dbeatty.my_model .............................. [RUN] 21:31:52 1 of 1 OK created sql table model dbt_dbeatty.my_model ......................... [INSERT 0 1 in 0.15s] 21:31:52 21:31:52 Finished running 1 table model in 0 hours 0 minutes and 0.38 seconds (0.38s). 21:31:52 21:31:52 Completed successfully 21:31:52 21:31:52 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 $ dbt build --full-refresh 21:31:55 Running with dbt=1.5.2 21:31:55 Registered adapter: postgres=1.5.2 21:31:55 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 307 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups 21:31:55 21:31:55 Concurrency: 5 threads (target='postgres') 21:31:55 21:31:55 1 of 1 START sql table model dbt_dbeatty.my_model .............................. [RUN] 21:31:55 1 of 1 ERROR creating sql table model dbt_dbeatty.my_model ..................... [ERROR in 0.14s] 21:31:55 21:31:55 Finished running 1 table model in 0 hours 0 minutes and 0.32 seconds (0.32s). 21:31:55 21:31:55 Completed with 1 error and 0 warnings: 21:31:55 21:31:55 Database Error in model my_model (models/my_model.sql) 21:31:55 relation "pk_constraint_abc" already exists 21:31:55 compiled Code at target/run/my_project/models/my_model.sql 21:31:55 21:31:55 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ```
mcooper-pi commented 10 months ago

@dbeatty10, so it sounds like it's working as you would expect it to be, but the error message isn't as specific enough? Or are you saying that's just for a workaround until the named constraint will work?

dbeatty10 commented 10 months ago

@dbeatty10, so it sounds like it's working as you would expect it to be, but the error message isn't as specific enough? Or are you saying that's just for a workaround until the named constraint will work?

Both:

The first one is probably easier to implement and can be done independently of the second. The second could be trickier and is probably not be a high priority fix (since there is a reasonable workaround and low impact to be without a constraint name)

Acceptance criteria

Potential implementation

Maybe we we just make the constraint name as only dbt metadata for dbt-postgres (rather than also being postgres metadata). i.e. execute SQL like this:

primary key (table_pk)

instead of this:

constraint pk_int__table_name primary key (table_pk)

The Postgres docs mention that the value of named constraints are twofold:

This clarifies error messages and allows you to refer to the constraint when you need to change it

So if we remove the constraint names from the DDL, then error messages may not have a have a human-readable constraint name, and users may not be able to refer to the constraint to change it.

These both sound acceptable to give up in order to avoid this problem.

dbeatty10 commented 10 months ago

Relevant source code

Here is the base implementation that adds constraint {constraint.name}.

If this issue only affects dbt-postgres, then it would need to override render_model_constraint with a custom implementation that removes this constraint naming.

If this issue affects many/most adapters, then we could just remove the constraint naming by default (but adapters could add it back in by overriding render_model_constraint).

mcooper-pi commented 10 months ago

Maybe we we just make the constraint name as only dbt metadata for dbt-postgres

This seems like a good idea. Personally I would rather be able to maintain my naming conventions and have it build successfully but have to track down errors, than to have it not build at all.

I appreciate you looking into this. Is there anything else you need from my end?

dbeatty10 commented 10 months ago

I appreciate you looking into this. Is there anything else you need from my end?

Thank you for finding and writing it up 🙏

We don't need anything else from your end now that we've fully reproduced the issue.

jtcohen6 commented 4 months ago

Moving to dbt-adapters now that the relevant implementation logic has also moved there. This is still low priority for us, given the provided workaround. If we decide on a narrower fix for dbt-postgres only, we can move there.