Closed jelstongreen closed 1 year ago
Thanks for opening this @jelstongreen !
Before we get into the details of this bug report, you might be interested in subscribing to https://github.com/dbt-labs/dbt-core/issues/7256 since the big idea in that feature is:
Clone my production state into my development schema, please!
Could you help me reproduce this by supplying example model .sql file(s) (and any related YAML configuration files)?
And also the commands to trigger the error?
i.e., something like this:
models/my_incremental_model.sql
{{ config(
materialized='incremental',
pre_hook="SQL-statement",
...
) }}
select ...
dbt_project.yml
# anything applicable from a dbt_project.yml file here
models/_models.yml
# anything applicable from a (model) properties file here
Commands:
dbt build --full-refresh
dbt run --select my_incremental_model
Hi @dbeatty10. Here you go: models/my_incremental_model.sql
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key="unique_id",
pre_hook = [
"""
{{ shallow_clone_production() }}
"""
],
post_hook = [
"""
{{ optimize_and_zorder(zorder_cols=['col_a','col_b']) }}
"""
]
) }}
SELECT col_a, col_b...
shallow_clone_production.sql
{%- macro shallow_clone_production() %}
{% if target.name in ('dev','testing') %}
{% if var('clone_incremental_dev', 'True') == 'True' %}
CREATE TABLE IF NOT EXISTS {{ this }}
SHALLOW CLONE {{ var('prod_schema') }}.{{ this.name }}
{% endif %}
{% endif %}
{%- endmacro %}
command
dbt run -s my_incremental_model --project-dir projects/my_project
Given that this is being looked at for 1.6 I imagine this isn't going to be a priority. I can create a workaround in the meantime and run the macro as an operation in a step for invoking dbt unless this is something you'd like to look at.
Thanks for that example @jelstongreen -- that helps me see exactly what is going on.
When you invoke dbt like dbt run -s my_model --target dev
with the setup you described, here's what it does:
my_model
doesn't exist yet, so it will need to be created from scratch rather than inserting into itSo you can see from the above how the error follows:
Running a macro as an operation is a good workaround until v1.6 comes out.
Another option you could try if you are so inclined:
spark__create_table_as
macro to be create table if not exists
rather than just a create table
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
can we check the status of this please?
@amyfeng-klaviyo we've closed this issue since we didn't find any bug within dbt-core to fix.
If you think you're experiencing a bug within dbt-core, could you open a new issue here and describe it in detail so we can try to reproduce it?
If you have a substantially similar use-case as jelstongreen described here, then I'd suggest one of the following:
dbt clone
a try (starting v.1.6)create table if not exists
rather than just a create table
@dbeatty10 thanks for the update! my question is really the same as the bug described above. We have set up a pre-hook macro to clone the production table for an incremental model (mainly for CI). while i m testing the pre-hook in local, The table:
Is this a new bug in dbt-core?
Current Behavior
When running incremental models in a dev / ci environment I am trying to shallow clone the production tables into the dev schema to fully simulate how the changes might affect the production model. Unfortunately whether a model should be run with
is_incremental
or not is determined at compile time, before the pre-hook creating the dev version of the model is run. This leads to dbt running the dev model from scratch even though it already exists.Expected Behavior
Ideally, the check for whether to run the model in incremental mode or not should be executed at runtime.
Steps To Reproduce
Opened a fresh branch which will execute tables in branch db. Run the model and it executes: pre-hook:
model:
If I then cancel that query and rerun dbt: pre-hook:
model:
The only difference between the two runs is that the dev version of the model (the clone) exists in the 2nd one at compile time.
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
spark, other (mention it in "Additional Context")
Additional Context
Databricks