dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
277 stars 169 forks source link

[ADAP-1096] [feature] Dynamic Tables: allow altering and updating model definition (`SELECT`) without a full refresh #889

Open awal11 opened 8 months ago

awal11 commented 8 months ago

Is this a new bug in dbt-snowflake?

Current Behavior

When I alter the model materialized as dynamic table and deploy it the same way as I deploy other materializations (view, table) - it seems to deploy properly, highlighted by the messages

dbt build --select your_model 13:14:48 1 of 12 START sql dynamic_table model your_model_v1 [RUN] 13:14:51 1 of 12 OK created sql dynamic_table model your_model_v1 [SUCCESS 1 in 3.21s]

but in reality - the model is not being updated, only the dt parameters are being updated.

The experience is different then with using views or regular tables and the messages shown above also do not match what is happening on the database.

I just discovered that comment: 'Dynamic table SQL cannot be updated; the dynamic table must go through a --full-refresh', but only as a result of serious head scratching.

Combined with #888 it is really undermining the first great impression that I had of dynamic tables implementation.

Expected Behavior

Ideally the dynamic table code should be always updated when the model changes, without the need to specify additional parameters.

At minimum

  1. The message "OK created sql dynamic_table model your_model_v1" should be changed to something "OK updated dynamic table model parameters for your_model_v1"
  2. In document https://docs.getdbt.com/reference/resource-configs/snowflake-configs, in the example describing the dynamic table and the "Project file"/"Property file"/"Config block" - please add parameter +full_refresh: true

Steps To Reproduce

It's a known limitation

Relevant log output

No response

Environment

- OS: Windows 10
- Python: 3.10.5
- dbt-core: 1.7.4
- dbt-snowflake: 1.7.1

Additional Context

No response

ewhauser commented 8 months ago

In document https://docs.getdbt.com/reference/resource-configs/snowflake-configs, in the example describing the dynamic table and the "Project file"/"Property file"/"Config block" - please add parameter +full_refresh: true

If you add full_refresh: true, then the table will get refreshed every time you run dbt and you lose the benefits.

awal11 commented 8 months ago

If you add full_refresh: true, then the table will get refreshed every time you run dbt and you lose the benefits.

Clearly yes, but how do I get the model to be actually updated in the database when I change it? Without "full_refresh" nothing was happening...

mikealfare commented 7 months ago

@awal11 For my clarification, you're only changing the SQL for the model, but no other configurations, correct? In that case, it is expected behavior that the model does not change; only the configuration is subject to change monitoring. SQL changes need to be deployed via full refresh (more info below). However, I am surprised that the standard create log message is shown. If the configuration has not changed (regardless if whether the SQL changed), then this message should show up:

https://github.com/dbt-labs/dbt-snowflake/blob/3dc838d1ae0083aac384cac091bab83f971b5c90/dbt/include/snowflake/macros/materializations/dynamic_table.sql#L42-L44

Can you please confirm that you're not seeing "No configuration changes were identified on: your_model_v1. Continuing." in the logs?

I just discovered that comment: 'Dynamic table SQL cannot be updated; the dynamic table must go through a --full-refresh', but only as a result of serious head scratching.

This is the result of a Snowflake limitation. The SQL that defines a dynamic table cannot be updated through an ALTER statement. To update the SQL, the dynamic table must be dropped and recreated. In dbt parlance, this is a full refresh.

If you add full_refresh: true, then the table will get refreshed every time you run dbt and you lose the benefits.

That is correct, you wouldn't want to use full refresh every time; it would act sort of like a table in that case. It would still get updated in between runs, unlike a table. But you would lose performance and cost benefits with each run. Ideally you would only use full refresh to "deploy" your new model.

Our thought process for dynamic tables was that they behave more like views from a deployment perspective. The only reason you would need to run dbt run on a view or dynamic table is because something changed on them, whether that's configuration or the SQL itself. After that, data is either updated regularly (DTs) or is always current (views). This would differ from tables and incremental models where dbt run is also required to update the data.

ewhauser commented 7 months ago

@mikealfare This all makes sense but I've also been struggling with how best to handle this in our deployments as I don't want the team to have to manually do things when dynamic tables change.

I understand the limitation around not having ALTER. Wondering if the right approach here would be to run with --defer --full-refresh? In theory, that should only do models that have changed and give the right behavior?

awal11 commented 7 months ago

@mikealfare

  1. when running a regulardbt run --select your_model (without enforcing full_refresh anywhere) several times in every combination, whether I do or don't change the model: I have never seen the "No configuration changes" in dbt.log. On the main output it always happily shows 1 of 1 OK created sql dynamic_table model your_model_v1 [SUCCESS 1 in 7.36s] while the log shows it's doing only alters on comments and lag. I could share the log, but it contains too much company detail to upload here.

  2. Our main goal is to get rid of 'dbt run' for refreshing the data completely. We'd like to use only views and dynamic tables and use 'dbt run' only for deployments when the code is updated, so I think exactly as you described.

mikealfare commented 7 months ago

It makes sense that 1 of 1 OK appears, as well as [SUCCESS 1 in 7.36s]. I wonder how the created sql part gets inserted as that is a bit misleading. No configuration changes should appear in the detailed logs either way. We'll see if we can reproduce it on our side.