Closed siljamardla closed 5 months ago
Thanks for reaching out @siljamardla -- indeed state selection can be tricky to think about!
I don't think this is a bug. Let me explain why.
Suppose you have the following simple models in your dbt project and you start with a set of production artifacts:
When you do a dbt compile --target dev
, it will:
dev
target for all references👉 Not specifying --select
is equivalent to selecting all nodes.
When you add --defer --favor-state --state prod-run-artifacts
, then it will:
prod-run-artifacts
for references to nodes that were not selected.👉 --defer --state
only comes into play for any un-selected nodes.
These are all equivalent:
dbt compile --target dev --defer --favor-state --state prod-run-artifacts
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select "fqn:*"
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c model_b model_a
And they all compile to this result (target/compiled/my_project/models/model_c.sql
):
-- model_c: "db"."feature_456"."model_c"
-- depends on:
-- model_a: "db"."feature_456"."model_a"
-- model_b: "db"."feature_456"."model_b"
select 1 as id
Either of the examples below will compile with different output than above since some nodes are not selected.
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c model_b
is the same as:
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c model_b model_a --exclude model_a
is the same as:
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --exclude model_a
Compiled output:
-- model_c: "db"."feature_456"."model_c"
-- depends on:
-- model_a: "db"."prod"."model_a"
-- model_b: "db"."feature_456"."model_b"
select 1 as id
dbt compile --target dev --defer --favor-state --state prod-run-artifacts --select model_c
Compiled output:
-- model_c: "db"."feature_456"."model_c"
-- depends on:
-- model_a: "db"."prod"."model_a"
-- model_b: "db"."prod"."model_b"
select 1 as id
dbt compile --target dev --defer --favor-state --state prod-run-artifacts
. The only thing that changes is what follows --select
.--favor-state
flag in the examples above. But the general principles stay the same when that flag is not added.I'm going to close this a not a bug, but please let me know I've missed something important and we can re-consider. 🙏
@dbeatty10
Thank you for the detailed examples, I get what you mean by defer never using production schema for those models that have been included in the --select
clause.
I've actually come back to this issue because I was trying to use defer for testing. I guess I'm hitting the same concept here. I tried something like dbt test --select my_model --defer --state target/live
. The model has a not null test defined for the order_key
column.
I was hoping I could compile and run the test for live data. Because, well, it would be good to test my test before I deploy it to live. And testing my test would be much more convenient and appropriate on live data, not on dev data.
However, the compiled test reads:
select order_key
from dev_schema.my_model
where order_key is null
Is there a way to compile the test to run on live? If not, should I create a feature request? What could be a good way to achieve this functionality?
Edit:
Out of curiosity I've also defined a more complex (dbt-expectations) test that has a ref
to another table in it:
tests:
- dbt_expectations.expect_table_aggregation_to_equal_other_table:
expression: count(*)
compare_model: source("source_name","source_table_for_my_model")
compare_expression: count(*)
group_by: [created_date]
compare_group_by: [created_date]
row_condition:
created_date = (SELECT max(created_date) from {{ref('my_other_model')}})
compare_row_condition: >
created_date = (SELECT max(created_date) from {{ref('my_other_model')}})
I was thinking that maybe this reference to another table will compile for the live schema because it's not included in the --select
when I run the test on my_model
. But no, it will still compile to dev_schema.my_other_model
in the test script.
Is this a new bug in dbt-core?
Current Behavior
Defer does not work with simple compile. This will compile to SQL that uses the dev schema tables:
However, if you need to compile production SQL for a specific table, this will work:
Expected Behavior
Any dbt command executed with
--defer
should have the same behaviour of combining live data with data that exist in the dev environment.Steps To Reproduce
dbt compile --defer --state target/live
(this is where I store the latest live manifest)dbt compile --defer --state target/live
againdbt compile --defer --state target/live --select my_model_name > target/tmp_my_model_name.sql
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
spark
Additional Context
https://github.com/dbt-labs/docs.getdbt.com/pull/5162