Open yengibar-manasyan-sp opened 2 months ago
Thanks for reporting this @yengibar-manasyan-sp !
It appears that the root cause is that during unit testing, source
s aren't guaranteed to have a CTE that is unique from ref
s or other source
s.
See below for a reproducible example ("reprex") of the issue you reported when a model and a source have the same name (case_1
). It also includes a separate issue when two sources have the same name (case_2
).
Looking at the output files like the following helps highlight what is going wrong:
target/run/my_project/models/_unit_tests.yml/models/test_case_0.sql
target/run/my_project/models/_unit_tests.yml/models/test_case_1.sql
The key outcome we'd need is:
One thing that is already to be unique across the DAG is unique_id
, so possibly that could be transformed into a suitable CTE name.
Some related code is described in https://github.com/dbt-labs/dbt-core/issues/5273#issuecomment-1131543844.
https://github.com/dbt-labs/dbt-adapters/pull/236 and https://github.com/dbt-labs/dbt-core/pull/10290 have proposed updates to the code that generates the CTE name, but I don't think either would fix this particular issue with unit tests.
Specifically, we could generate some kind of globally_unique_identifier
and then use it like this:
return self.adapter.Relation.add_ephemeral_prefix(globally_unique_identifier)
Here are a couple ideas of how to create the CTE in unit tests so that they are unique:
unique_id
+ quoted identifiersunique_id
+ quoted identifiersunique_id
+ quoted identifiersExample:
with "__dbt__cte__source.my_project.src_1.model_a" as (
...
),
"__dbt__cte__model.my_project.model_a" as (
Pros:
unique_id
unique_id
at all -- just quotes itCons:
unique_id
+ quoted identifiersExamples:
with __dbt__cte__e83c5163316f89bfbde7d9ab23ca2e25604af290 as (
or
with __dbt__cte__dim_customers_e83c5163316f as (
Pros:
unique_id
Cons:
We could hash the unique_id
similar to _create_sha1_hash
:
If we don't care about readability of the CTE name, then we could just use the hexdigest as-is which for SHA1 would be a 40-character value like this:
e83c5163316f89bfbde7d9ab23ca2e25604af290
If there are readability concerns, then we could combine an abbreviated version of the hash along with the node name
(or identifier
), which might look like this:
dim_customers_e83c5163316f
Here's some other ideas:
alias
that is only used during unit testsalias
that is only used during unit testsSimilar to https://github.com/dbt-labs/dbt-core/pull/10290
Example:
with "__dbt__cte__PROVIDED_SOURCE_ALIAS_HERE" as (
...
),
"__dbt__cte__model_a" as (
Pros:
Cons:
alias
, it would be confusing because it wouldn't behave the exact same as alias
for other resources.Pros:
Cons:
We've just come across this issue ourselves. Are you planning contributing to this in the nearest future, @dbeatty10 ?
Is this a new bug in dbt-core?
Current Behavior
One of my models had the same name as the source Snowflake table. They live in different schemas.
dbt run
works fine and generates all the data as expected. However, DBT unit tests fail because unit tests can not differentiate mocked source tables from these tables. Here is an example:When I checked the generated SQL query, I saw that it adds
__dbt__cte__
prefixes to the CTE names and replaced raws forsource('src', 'MODEL_1')
with values fromthis
.Expected Behavior
Fix unit tests to support identical model and source table names in different schemas.
Steps To Reproduce
this
(the model itself)Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
No response