dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.58k stars 1.59k forks source link

[Bug] DBT docs erroneously complains about duplicate definition of resources #9836

Open pvanderlinden opened 5 months ago

pvanderlinden commented 5 months ago

Is this a new bug in dbt-core?

Current Behavior

dbt docs generate gives the following error:

Compilation Error
  dbt found two schema.yml entries for the same resource named model.name. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for model.name in this file:
   - models/xxx/properties.yml

while it is only defined once

Expected Behavior

generate the docs

Steps To Reproduce

Relevant log output

Example without modifying any files:

$ dbt docs generate                                                                                                                           
15:45:23  Running with dbt=1.7.11
15:45:23  Registered adapter: snowflake=1.7.3
15:45:23  Encountered an error:
Compilation Error
  dbt found two schema.yml entries for the same resource named model.name. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for model.name in this file:
   - models/some_models/properties.yml

$ dbt clean                                                                                                                                 
15:45:31  Running with dbt=1.7.11
15:45:31  Checking /home/paul/projects/dbt/dbt_packages/*
15:45:31  Cleaned /home/paul/projects/dbt/dbt_packages/*
15:45:31  Checking /home/paul/projects/dbt/target/*
15:45:31  Cleaned /home/paul/projects/dbt/target/*
15:45:31  Finished cleaning all paths.

$ dbt docs generate                                                                                                                        INT ✘  ▼  16:45:34 
15:45:37  Running with dbt=1.7.11
15:45:37  Registered adapter: snowflake=1.7.3
15:45:37  Unable to do partial parsing because saved manifest not found. Starting full parse.
15:45:38  Found 115 models, 32 seeds, 3 operations, 2 tests, 0 sources, 0 exposures, 0 metrics, 449 macros, 0 groups, 0 semantic models
15:45:38
15:45:45  Concurrency: 2 threads (target='dev')
15:45:45
15:45:49  Building catalog
15:45:54  Catalog written to /home/paul/projects/dbt/target/catalog.json

Environment

- OS: Arch Linux
- Python: 3.11.6
- dbt: 1.7.11

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

dbeatty10 commented 5 months ago

Thanks for raising this issue @pvanderlinden !

This looks similar to https://github.com/dbt-labs/dbt-core/issues/4233, but has a distinct error message.

Two questions:

  1. Does it work for you if you use the --no-partial-parse flag (like dbt docs generate --no-partial-parse) ? It should give the same benefit as dbt clean, but a little more ergonomically.
  2. I wasn't able to reproduce this. Can you try to find a simple set of files (and workflow steps) that will reliably reproduce the issue you are seeing?
pvanderlinden commented 5 months ago

Thanks @dbeatty10 --no-partial-parse also works around the issue. I just figured out how to reproduce the issue.

Reproduce steps:

We have this macro in place to have tables/views in different schemas with the same name, e.g.

db.schema1.table_one
db.schema2.table_one

As we can't have two files named table_one.sql even in a different directory, we have to prefix the files (which is included in the "ref").

dbeatty10 commented 5 months ago

Thanks for sharing these details @pvanderlinden 🏆

I was able to reproduce what you are seeing using dbt-duckdb. See below for the smallest reproducible example I could find.

The most impactful piece was:

Details

Reprex

Assuming you already have a simple dbt_project.yml with a dbt profile configured, here's all that is needed to reproduce:

dbt clean
mkdir -p models
cat << EOF > models/my_first_dbt_model.sql
select 1 as id
EOF
cat << EOF > models/test.my_second_dbt_model.sql
-- {{ ref("my_first_dbt_model") }}
select 2 as id
EOF
cat << EOF > models/_models.yml
models:
  - name: "my_first_dbt_model"
    description: My favorite color is blue
  - name: "test.my_second_dbt_model"
    description: My favorite color is green
EOF
dbt docs generate
cat << EOF > models/_models.yml
models:
  - name: "my_first_dbt_model"
    description: My favorite color is blue
  - name: "test.my_second_dbt_model"
    description: My favorite color is (not actually) green
EOF
dbt docs generate
pvanderlinden commented 5 months ago

A few extra observations on this issue:

dbeatty10 commented 5 months ago

These are great observations!

not directly the same issue: if you go to the table in the docs, the name is correct in the tree on the left, but the title of the page still includes the stripped of prefix.

Yep, this looks separate. If you think this is an issue, feel free to open a bug report in the dbt-docs repo, and we can take a closer look.