dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.25k stars 1.53k forks source link

[Bug] Additional whitespace in `target_database` causes dbt to replace snapshot entirely (ctas instead of insert/merge) #10356

Open jeremyyeo opened 5 days ago

jeremyyeo commented 5 days ago

Is this a new bug in dbt-core?

Current Behavior

Potential footgun - If you forget to remove whitespace from the target_database config of snapshots - dbt will fail to detect that the snapshot already exist and therefor does a CTAS everytime.

Expected Behavior

This doesn't happen for incremental models - as in, even if we had a whitespace for the database config, dbt will correctly detect that the model already exist and insert/merge as expected. Snapshots should follow this too.

Steps To Reproduce

  1. Project setup.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
 my_dbt_project:
    +database: |
        {%- if target.name == 'sf' %} development_jyeo
        {%- else -%} development
        {%- endif -%}

snapshots:
  my_dbt_project:
    +target_database: |
        {%- if target.name == 'sf' %} development_jyeo
        {%- else -%} development
        {%- endif -%}
    +transient: false

Notice the missing - before %} on the closing of the if target.name conditional plus a space just before the database name. This causes target_database = ' development_jyeo' instead of target_database = 'development_jyeo' (additional whitespace).

-- models/foo.sql
{{ config(materialized='incremental') }}
select 1 id

-- snapshots/snappy.sql
{% snapshot snappy %}

{{
    config(
      target_schema='dbt_jyeo_snapshot',
      unique_key='id',
      strategy='timestamp',
      updated_at='updated_at',
    )
}}

select 1 id, 'alice' as name, '1970-01-01'::date as updated_at

{% endsnapshot %}
  1. Drop all existing tables first:
drop table if exists development_jyeo.dbt_jyeo.foo;
drop table if exists development_jyeo.dbt_jyeo_snapshots.snappy;
  1. Build for the first time:
$ dbt --debug build
00:52:31  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1159992d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1159eb0d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1159ebf50>]}
00:52:31  Running with dbt=1.7.16
00:52:31  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'True', 'fail_fast': 'False', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'version_check': 'True', 'profiles_dir': '/Users/jeremy/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt --debug build', 'log_format': 'default', 'introspect': 'True', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
00:52:31  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x117af1090>]}
00:52:31  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1255ca090>]}
00:52:31  Registered adapter: snowflake=1.7.5
00:52:31  checksum: b42e23759fbef548c33c6e49ebf263a93f046443b71b5f37fe49de1c6429a7cf, vars: {}, profile: , target: , version: 1.7.16
00:52:31  Unable to do partial parsing because a project config has changed
00:52:31  Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126cabfd0>]}
00:52:32  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126e97350>]}
00:52:32  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x127260410>]}
00:52:32  Found 1 model, 1 snapshot, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
00:52:32  
00:52:32  Acquiring new snowflake connection 'master'
00:52:32  Acquiring new snowflake connection 'list_ development_jyeo'
00:52:32  Using snowflake connection "list_ development_jyeo"
00:52:32  On list_ development_jyeo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_ development_jyeo"} */
show terse schemas in database  development_jyeo
    limit 10000
00:52:32  Opening a new connection, currently in state init
00:52:34  SQL status: SUCCESS 44 in 3.0 seconds
00:52:35  On list_ development_jyeo: Close
00:52:35  Re-using an available connection from the pool (formerly list_ development_jyeo, now list_development_jyeo)
00:52:35  Using snowflake connection "list_development_jyeo"
00:52:35  On list_development_jyeo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_jyeo"} */
show terse schemas in database development_jyeo
    limit 10000
00:52:35  Opening a new connection, currently in state closed
00:52:37  SQL status: SUCCESS 44 in 2.0 seconds
00:52:37  On list_development_jyeo: Close
00:52:37  Re-using an available connection from the pool (formerly list_development_jyeo, now list_ development_jyeo_dbt_jyeo_snapshot)
00:52:37  Using snowflake connection "list_ development_jyeo_dbt_jyeo_snapshot"
00:52:37  On list_ development_jyeo_dbt_jyeo_snapshot: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_ development_jyeo_dbt_jyeo_snapshot"} */
show objects in  development_jyeo.dbt_jyeo_snapshot limit 10000
00:52:37  Opening a new connection, currently in state closed
00:52:39  SQL status: SUCCESS 1 in 2.0 seconds
00:52:39  On list_ development_jyeo_dbt_jyeo_snapshot: Close
00:52:40  Re-using an available connection from the pool (formerly list_ development_jyeo_dbt_jyeo_snapshot, now list_development_jyeo_dbt_jyeo)
00:52:40  Using snowflake connection "list_development_jyeo_dbt_jyeo"
00:52:40  On list_development_jyeo_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_jyeo_dbt_jyeo"} */
show objects in development_jyeo.dbt_jyeo limit 10000
00:52:40  Opening a new connection, currently in state closed
00:52:42  SQL status: SUCCESS 250 in 2.0 seconds
00:52:42  On list_development_jyeo_dbt_jyeo: Close
00:52:42  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12746f710>]}
00:52:42  Concurrency: 1 threads (target='sf')
00:52:42  
00:52:42  Began running node model.my_dbt_project.foo
00:52:42  1 of 2 START sql incremental model dbt_jyeo.foo ................................ [RUN]
00:52:42  Re-using an available connection from the pool (formerly list_development_jyeo_dbt_jyeo, now model.my_dbt_project.foo)
00:52:42  Began compiling node model.my_dbt_project.foo
00:52:42  Writing injected SQL for node "model.my_dbt_project.foo"
00:52:42  Timing info for model.my_dbt_project.foo (compile): 12:52:42.928249 => 12:52:42.932603
00:52:42  Began executing node model.my_dbt_project.foo
00:52:42  Writing runtime sql for node "model.my_dbt_project.foo"
00:52:42  Using snowflake connection "model.my_dbt_project.foo"
00:52:42  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (-- models/foo.sql

select 1 id
        );
00:52:42  Opening a new connection, currently in state closed
00:52:45  SQL status: SUCCESS 1 in 2.0 seconds
00:52:45  Applying DROP to: development_jyeo.dbt_jyeo.foo__dbt_tmp
00:52:45  Using snowflake connection "model.my_dbt_project.foo"
00:52:45  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
drop view if exists development_jyeo.dbt_jyeo.foo__dbt_tmp cascade
00:52:45  SQL status: SUCCESS 1 in 0.0 seconds
00:52:45  Timing info for model.my_dbt_project.foo (execute): 12:52:42.933226 => 12:52:45.879727
00:52:45  On model.my_dbt_project.foo: Close
00:52:46  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x127393350>]}
00:52:46  1 of 2 OK created sql incremental model dbt_jyeo.foo ........................... [SUCCESS 1 in 3.66s]
00:52:46  Finished running node model.my_dbt_project.foo
00:52:46  Began running node snapshot.my_dbt_project.snappy
00:52:46  2 of 2 START snapshot  development_jyeo.dbt_jyeo_snapshot.snappy ............... [RUN]
00:52:46  Re-using an available connection from the pool (formerly model.my_dbt_project.foo, now snapshot.my_dbt_project.snappy)
00:52:46  Began compiling node snapshot.my_dbt_project.snappy
00:52:46  Timing info for snapshot.my_dbt_project.snappy (compile): 12:52:46.592971 => 12:52:46.596575
00:52:46  Began executing node snapshot.my_dbt_project.snappy
00:52:46  Writing runtime sql for node "snapshot.my_dbt_project.snappy"
00:52:46  Using snowflake connection "snapshot.my_dbt_project.snappy"
00:52:46  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
create or replace  table  development_jyeo.dbt_jyeo_snapshot.snappy
         as
        (

    select *,
        md5(coalesce(cast(id as varchar ), '')
         || '|' || coalesce(cast(updated_at as varchar ), '')
        ) as dbt_scd_id,
        updated_at as dbt_updated_at,
        updated_at as dbt_valid_from,
        nullif(updated_at, updated_at) as dbt_valid_to
    from (

select 1 id, 'alice' as name, '1970-01-01'::date as updated_at

    ) sbq

        );
00:52:46  Opening a new connection, currently in state closed
00:52:48  SQL status: SUCCESS 1 in 2.0 seconds
00:52:48  Timing info for snapshot.my_dbt_project.snappy (execute): 12:52:46.597681 => 12:52:48.875607
00:52:48  On snapshot.my_dbt_project.snappy: Close
00:52:49  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '0d5f2cf5-a986-4937-936d-a3a592f321cd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1274669d0>]}
00:52:49  2 of 2 OK snapshotted  development_jyeo.dbt_jyeo_snapshot.snappy ............... [SUCCESS 1 in 2.93s]
00:52:49  Finished running node snapshot.my_dbt_project.snappy
00:52:49  Connection 'master' was properly closed.
00:52:49  Connection 'snapshot.my_dbt_project.snappy' was properly closed.
00:52:49  
00:52:49  Finished running 1 incremental model, 1 snapshot in 0 hours 0 minutes and 17.12 seconds (17.12s).
00:52:49  Command end result
00:52:49  
00:52:49  Completed successfully
00:52:49  
00:52:49  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
00:52:49  Resource report: {"command_name": "build", "command_success": true, "command_wall_clock_time": 18.308712, "process_user_time": 3.371448, "process_kernel_time": 1.123735, "process_mem_max_rss": "212369408", "process_in_blocks": "0", "process_out_blocks": "0"}
00:52:49  Command `dbt build` succeeded at 12:52:49.535134 after 18.31 seconds
00:52:49  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1159e3710>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x110da21d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104772e10>]}
00:52:49  Flushing usage events

Both foo and snappy did a CTAS as expected - cause they don't yet exist in the database.

  1. Build for the second time:
$ dbt --debug build
00:53:41  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1089f7250>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x108ebed50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x108ebfc50>]}
00:53:41  Running with dbt=1.7.16
00:53:41  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'True', 'profiles_dir': '/Users/jeremy/.dbt', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'fail_fast': 'False', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'target_path': 'None', 'invocation_command': 'dbt --debug build', 'send_anonymous_usage_stats': 'True'}
00:53:42  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126d96710>]}
00:53:42  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126dd2ed0>]}
00:53:42  Registered adapter: snowflake=1.7.5
00:53:42  checksum: b42e23759fbef548c33c6e49ebf263a93f046443b71b5f37fe49de1c6429a7cf, vars: {}, profile: , target: , version: 1.7.16
00:53:42  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
00:53:42  Partial parsing enabled, no changes found, skipping parsing
00:53:42  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126d97ad0>]}
00:53:42  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126ee99d0>]}
00:53:42  Found 1 model, 1 snapshot, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
00:53:42  
00:53:42  Acquiring new snowflake connection 'master'
00:53:42  Acquiring new snowflake connection 'list_ development_jyeo'
00:53:42  Using snowflake connection "list_ development_jyeo"
00:53:42  On list_ development_jyeo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_ development_jyeo"} */
show terse schemas in database  development_jyeo
    limit 10000
00:53:42  Opening a new connection, currently in state init
00:53:43  SQL status: SUCCESS 44 in 2.0 seconds
00:53:43  On list_ development_jyeo: Close
00:53:44  Re-using an available connection from the pool (formerly list_ development_jyeo, now list_development_jyeo)
00:53:44  Using snowflake connection "list_development_jyeo"
00:53:44  On list_development_jyeo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_jyeo"} */
show terse schemas in database development_jyeo
    limit 10000
00:53:44  Opening a new connection, currently in state closed
00:53:46  SQL status: SUCCESS 44 in 2.0 seconds
00:53:46  On list_development_jyeo: Close
00:53:46  Re-using an available connection from the pool (formerly list_development_jyeo, now list_ development_jyeo_dbt_jyeo_snapshot)
00:53:46  Using snowflake connection "list_ development_jyeo_dbt_jyeo_snapshot"
00:53:46  On list_ development_jyeo_dbt_jyeo_snapshot: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_ development_jyeo_dbt_jyeo_snapshot"} */
show objects in  development_jyeo.dbt_jyeo_snapshot limit 10000
00:53:46  Opening a new connection, currently in state closed
00:53:48  SQL status: SUCCESS 1 in 2.0 seconds
00:53:48  On list_ development_jyeo_dbt_jyeo_snapshot: Close
00:53:49  Re-using an available connection from the pool (formerly list_ development_jyeo_dbt_jyeo_snapshot, now list_development_jyeo_dbt_jyeo)
00:53:49  Using snowflake connection "list_development_jyeo_dbt_jyeo"
00:53:49  On list_development_jyeo_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_jyeo_dbt_jyeo"} */
show objects in development_jyeo.dbt_jyeo limit 10000
00:53:49  Opening a new connection, currently in state closed
00:53:50  SQL status: SUCCESS 251 in 2.0 seconds
00:53:50  On list_development_jyeo_dbt_jyeo: Close
00:53:51  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12724b710>]}
00:53:51  Concurrency: 1 threads (target='sf')
00:53:51  
00:53:51  Began running node model.my_dbt_project.foo
00:53:51  1 of 2 START sql incremental model dbt_jyeo.foo ................................ [RUN]
00:53:51  Acquiring new snowflake connection 'model.my_dbt_project.foo'
00:53:51  Began compiling node model.my_dbt_project.foo
00:53:51  Writing injected SQL for node "model.my_dbt_project.foo"
00:53:51  Timing info for model.my_dbt_project.foo (compile): 12:53:51.623003 => 12:53:51.627529
00:53:51  Began executing node model.my_dbt_project.foo
00:53:51  Using snowflake connection "model.my_dbt_project.foo"
00:53:51  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace  temporary view development_jyeo.dbt_jyeo.foo__dbt_tmp

   as (
    -- models/foo.sql

select 1 id
  );
00:53:51  Opening a new connection, currently in state init
00:53:53  SQL status: SUCCESS 1 in 2.0 seconds
00:53:53  Using snowflake connection "model.my_dbt_project.foo"
00:53:53  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo__dbt_tmp
00:53:54  SQL status: SUCCESS 1 in 0.0 seconds
00:53:54  Using snowflake connection "model.my_dbt_project.foo"
00:53:54  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
00:53:54  SQL status: SUCCESS 1 in 0.0 seconds
00:53:54  Using snowflake connection "model.my_dbt_project.foo"
00:53:54  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."FOO"
00:53:54  SQL status: SUCCESS 1 in 0.0 seconds
00:53:54  Writing runtime sql for node "model.my_dbt_project.foo"
00:53:54  Using snowflake connection "model.my_dbt_project.foo"
00:53:54  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
-- back compat for old kwarg name

  begin;
00:53:55  SQL status: SUCCESS 1 in 0.0 seconds
00:53:55  Using snowflake connection "model.my_dbt_project.foo"
00:53:55  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
insert into development_jyeo.dbt_jyeo.foo ("ID")
        (
            select "ID"
            from development_jyeo.dbt_jyeo.foo__dbt_tmp
        );
00:53:56  SQL status: SUCCESS 1 in 1.0 seconds
00:53:56  Using snowflake connection "model.my_dbt_project.foo"
00:53:56  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
COMMIT
00:53:57  SQL status: SUCCESS 1 in 1.0 seconds
00:53:57  Applying DROP to: development_jyeo.dbt_jyeo.foo__dbt_tmp
00:53:57  Using snowflake connection "model.my_dbt_project.foo"
00:53:57  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
drop view if exists development_jyeo.dbt_jyeo.foo__dbt_tmp cascade
00:53:57  SQL status: SUCCESS 1 in 0.0 seconds
00:53:57  Timing info for model.my_dbt_project.foo (execute): 12:53:51.628220 => 12:53:57.531981
00:53:57  On model.my_dbt_project.foo: Close
00:53:58  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126ed6810>]}
00:53:58  1 of 2 OK created sql incremental model dbt_jyeo.foo ........................... [SUCCESS 1 in 6.49s]
00:53:58  Finished running node model.my_dbt_project.foo
00:53:58  Began running node snapshot.my_dbt_project.snappy
00:53:58  2 of 2 START snapshot  development_jyeo.dbt_jyeo_snapshot.snappy ............... [RUN]
00:53:58  Re-using an available connection from the pool (formerly model.my_dbt_project.foo, now snapshot.my_dbt_project.snappy)
00:53:58  Began compiling node snapshot.my_dbt_project.snappy
00:53:58  Timing info for snapshot.my_dbt_project.snappy (compile): 12:53:58.123000 => 12:53:58.126751
00:53:58  Began executing node snapshot.my_dbt_project.snappy
00:53:58  Writing runtime sql for node "snapshot.my_dbt_project.snappy"
00:53:58  Using snowflake connection "snapshot.my_dbt_project.snappy"
00:53:58  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.7.16", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
create or replace  table  development_jyeo.dbt_jyeo_snapshot.snappy
         as
        (

    select *,
        md5(coalesce(cast(id as varchar ), '')
         || '|' || coalesce(cast(updated_at as varchar ), '')
        ) as dbt_scd_id,
        updated_at as dbt_updated_at,
        updated_at as dbt_valid_from,
        nullif(updated_at, updated_at) as dbt_valid_to
    from (

select 1 id, 'alice' as name, '1970-01-01'::date as updated_at

    ) sbq

        );
00:53:58  Opening a new connection, currently in state closed
00:54:00  SQL status: SUCCESS 1 in 2.0 seconds
00:54:00  Timing info for snapshot.my_dbt_project.snappy (execute): 12:53:58.127820 => 12:54:00.337249
00:54:00  On snapshot.my_dbt_project.snappy: Close
00:54:01  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'c71a4770-b5d3-47b3-a61a-01d7c5e0e278', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1272efe50>]}
00:54:01  2 of 2 OK snapshotted  development_jyeo.dbt_jyeo_snapshot.snappy ............... [SUCCESS 1 in 2.90s]
00:54:01  Finished running node snapshot.my_dbt_project.snappy
00:54:01  Connection 'master' was properly closed.
00:54:01  Connection 'list_development_jyeo_dbt_jyeo' was properly closed.
00:54:01  Connection 'snapshot.my_dbt_project.snappy' was properly closed.
00:54:01  
00:54:01  Finished running 1 incremental model, 1 snapshot in 0 hours 0 minutes and 18.93 seconds (18.93s).
00:54:01  Command end result
00:54:01  
00:54:01  Completed successfully
00:54:01  
00:54:01  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
00:54:01  Resource report: {"command_name": "build", "command_success": true, "command_wall_clock_time": 19.600353, "process_user_time": 2.802236, "process_kernel_time": 1.467897, "process_mem_max_rss": "205258752", "process_in_blocks": "0", "process_out_blocks": "0"}
00:54:01  Command `dbt build` succeeded at 12:54:01.084376 after 19.60 seconds
00:54:01  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1043b6e10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104520790>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104520750>]}
00:54:01  Flushing usage events
  1. foo already exist so it is inserted into - expected.

  2. snappy already exist but it is replaced instead of merged into - unexpected.

  3. Do a quick switch over to dbt 1.8 and build to see that this problem still exist:

$ dbt --debug build
00:57:01  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10772a110>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107721450>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107773d10>]}
00:57:01  Running with dbt=1.8.2
00:57:01  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/Users/jeremy/.dbt', 'fail_fast': 'False', 'warn_error': 'None', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'debug': 'True', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt --debug build', 'log_format': 'default', 'introspect': 'True', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
00:57:02  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107743d90>]}
00:57:02  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1070ba4d0>]}
00:57:02  Registered adapter: snowflake=1.8.3
00:57:02  checksum: f0a141b9de567cb7f131613ba2682f40b3a1c6a157f81c6e5435862eff68f109, vars: {}, profile: , target: , version: 1.8.2
00:57:02  Unable to do partial parsing because of a version mismatch
00:57:02  Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x16adbf350>]}
00:57:03  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x16b0e9990>]}
00:57:03  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x16b6c2b90>]}
00:57:03  Found 1 model, 1 snapshot, 444 macros
00:57:03  
00:57:03  Acquiring new snowflake connection 'master'
00:57:03  Acquiring new snowflake connection 'list_development_jyeo'
00:57:03  Using snowflake connection "list_development_jyeo"
00:57:03  On list_development_jyeo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_jyeo"} */
show terse schemas in database development_jyeo
    limit 10000
00:57:03  Opening a new connection, currently in state init
00:57:06  SQL status: SUCCESS 44 in 3.0 seconds
00:57:06  On list_development_jyeo: Close
00:57:06  Re-using an available connection from the pool (formerly list_development_jyeo, now list_ development_jyeo)
00:57:06  Using snowflake connection "list_ development_jyeo"
00:57:06  On list_ development_jyeo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "connection_name": "list_ development_jyeo"} */
show terse schemas in database  development_jyeo
    limit 10000
00:57:06  Opening a new connection, currently in state closed
00:57:08  SQL status: SUCCESS 44 in 2.0 seconds
00:57:08  On list_ development_jyeo: Close
00:57:09  Re-using an available connection from the pool (formerly list_ development_jyeo, now list_ development_jyeo_dbt_jyeo_snapshot)
00:57:09  Using snowflake connection "list_ development_jyeo_dbt_jyeo_snapshot"
00:57:09  On list_ development_jyeo_dbt_jyeo_snapshot: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "connection_name": "list_ development_jyeo_dbt_jyeo_snapshot"} */
show objects in  development_jyeo.dbt_jyeo_snapshot limit 10000
00:57:09  Opening a new connection, currently in state closed
00:57:10  SQL status: SUCCESS 1 in 2.0 seconds
00:57:10  On list_ development_jyeo_dbt_jyeo_snapshot: Close
00:57:11  Re-using an available connection from the pool (formerly list_ development_jyeo_dbt_jyeo_snapshot, now list_development_jyeo_dbt_jyeo)
00:57:11  Using snowflake connection "list_development_jyeo_dbt_jyeo"
00:57:11  On list_development_jyeo_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_jyeo_dbt_jyeo"} */
show objects in development_jyeo.dbt_jyeo limit 10000
00:57:11  Opening a new connection, currently in state closed
00:57:13  SQL status: SUCCESS 251 in 2.0 seconds
00:57:13  On list_development_jyeo_dbt_jyeo: Close
00:57:13  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x16b954850>]}
00:57:13  Concurrency: 1 threads (target='sf')
00:57:13  
00:57:13  Began running node model.my_dbt_project.foo
00:57:13  1 of 2 START sql incremental model dbt_jyeo.foo ................................ [RUN]
00:57:13  Re-using an available connection from the pool (formerly list_development_jyeo_dbt_jyeo, now model.my_dbt_project.foo)
00:57:13  Began compiling node model.my_dbt_project.foo
00:57:13  Writing injected SQL for node "model.my_dbt_project.foo"
00:57:13  Began executing node model.my_dbt_project.foo
00:57:13  Using snowflake connection "model.my_dbt_project.foo"
00:57:13  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace  temporary view development_jyeo.dbt_jyeo.foo__dbt_tmp

   as (
    -- models/foo.sql

select 1 id
  );
00:57:13  Opening a new connection, currently in state closed
00:57:15  SQL status: SUCCESS 1 in 2.0 seconds
00:57:15  Using snowflake connection "model.my_dbt_project.foo"
00:57:15  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo__dbt_tmp
00:57:15  SQL status: SUCCESS 1 in 0.0 seconds
00:57:15  Using snowflake connection "model.my_dbt_project.foo"
00:57:15  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
00:57:16  SQL status: SUCCESS 1 in 0.0 seconds
00:57:16  Using snowflake connection "model.my_dbt_project.foo"
00:57:16  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."FOO"
00:57:16  SQL status: SUCCESS 1 in 0.0 seconds
00:57:16  Writing runtime sql for node "model.my_dbt_project.foo"
00:57:16  Using snowflake connection "model.my_dbt_project.foo"
00:57:16  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
-- back compat for old kwarg name

  begin;
00:57:16  SQL status: SUCCESS 1 in 0.0 seconds
00:57:16  Using snowflake connection "model.my_dbt_project.foo"
00:57:16  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
insert into development_jyeo.dbt_jyeo.foo ("ID")
        (
            select "ID"
            from development_jyeo.dbt_jyeo.foo__dbt_tmp
        );
00:57:17  SQL status: SUCCESS 1 in 1.0 seconds
00:57:17  Using snowflake connection "model.my_dbt_project.foo"
00:57:17  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
COMMIT
00:57:18  SQL status: SUCCESS 1 in 0.0 seconds
00:57:18  Applying DROP to: development_jyeo.dbt_jyeo.foo__dbt_tmp
00:57:18  Using snowflake connection "model.my_dbt_project.foo"
00:57:18  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
drop view if exists development_jyeo.dbt_jyeo.foo__dbt_tmp cascade
00:57:18  SQL status: SUCCESS 1 in 0.0 seconds
00:57:18  On model.my_dbt_project.foo: Close
00:57:19  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x16b956d90>]}
00:57:19  1 of 2 OK created sql incremental model dbt_jyeo.foo ........................... [SUCCESS 1 in 5.40s]
00:57:19  Finished running node model.my_dbt_project.foo
00:57:19  Began running node snapshot.my_dbt_project.snappy
00:57:19  2 of 2 START snapshot  development_jyeo.dbt_jyeo_snapshot.snappy ............... [RUN]
00:57:19  Re-using an available connection from the pool (formerly model.my_dbt_project.foo, now snapshot.my_dbt_project.snappy)
00:57:19  Began compiling node snapshot.my_dbt_project.snappy
00:57:19  Began executing node snapshot.my_dbt_project.snappy
00:57:19  Writing runtime sql for node "snapshot.my_dbt_project.snappy"
00:57:19  Using snowflake connection "snapshot.my_dbt_project.snappy"
00:57:19  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
create or replace  table  development_jyeo.dbt_jyeo_snapshot.snappy
         as
        (

    select *,
        md5(coalesce(cast(id as varchar ), '')
         || '|' || coalesce(cast(updated_at as varchar ), '')
        ) as dbt_scd_id,
        updated_at as dbt_updated_at,
        updated_at as dbt_valid_from,
        nullif(updated_at, updated_at) as dbt_valid_to
    from (

select 1 id, 'alice' as name, '1970-01-01'::date as updated_at

    ) sbq

        );
00:57:19  Opening a new connection, currently in state closed
00:57:21  SQL status: SUCCESS 1 in 2.0 seconds
00:57:21  On snapshot.my_dbt_project.snappy: Close
00:57:22  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '4bf78b71-5a85-45ae-8050-280eb9245354', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x16b6eda50>]}
00:57:22  2 of 2 OK snapshotted  development_jyeo.dbt_jyeo_snapshot.snappy ............... [SUCCESS 1 in 2.90s]
00:57:22  Finished running node snapshot.my_dbt_project.snappy
00:57:22  Connection 'master' was properly closed.
00:57:22  Connection 'snapshot.my_dbt_project.snappy' was properly closed.
00:57:22  
00:57:22  Finished running 1 incremental model, 1 snapshot in 0 hours 0 minutes and 18.54 seconds (18.54s).
00:57:22  Command end result
00:57:22  
00:57:22  Completed successfully
00:57:22  
00:57:22  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
00:57:22  Resource report: {"command_name": "build", "command_success": true, "command_wall_clock_time": 20.1641, "process_user_time": 3.268337, "process_kernel_time": 1.772584, "process_mem_max_rss": "208633856", "process_in_blocks": "0", "process_out_blocks": "0"}
00:57:22  Command `dbt build` succeeded at 12:57:22.058997 after 20.16 seconds
00:57:22  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107773610>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10776bd50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1030b53d0>]}
00:57:22  Flushing usage events

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt: dbt-core==1.7.16 + dbt-snowflake==1.7.5 / dbt-core==1.8.2 + dbt-snowflake==1.8.3

Which database adapter are you using with dbt?

snowflake

Additional Context

If you do this on postgres - you get a cross-db error with the whitespace showing:

Runtime Error
  ERROR: Cross-db references not allowed in postgres ( postgres vs postgres)
jeremyyeo commented 5 days ago

https://github.com/dbt-labs/dbt-adapters/blob/e964302121f70bacc8f827683c329e06696216cf/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql#L11-L15

That is returning false for target_relation_exists...

{% macro check() %}
    {% do log(adapter.get_relation(database="development_jyeo", schema="dbt_jyeo_snapshot", identifier="snappy")) %}
    {% do log(adapter.get_relation(database=" development_jyeo", schema="dbt_jyeo_snapshot", identifier="snappy")) %}
{% endmacro %}
$ dbt run-operation check
01:12:51  Running with dbt=1.8.2
01:12:52  Registered adapter: snowflake=1.8.3
01:12:52  Found 1 model, 1 snapshot, 446 macros
01:12:55  "DEVELOPMENT_JYEO"."DBT_JYEO_SNAPSHOT"."SNAPPY"
01:12:55  None

Incrementals do a check for existing from the cache: https://github.com/dbt-labs/dbt-adapters/blob/e964302121f70bacc8f827683c329e06696216cf/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql#L5

dbeatty10 commented 15 hours ago

Thanks for reporting this @jeremyyeo -- we do not want it to be possible to overwrite a pre-existing snapshot and incur the accompanying data loss!

It looks to me like the Snowflake default quoting config is coming into play here.

So I added this to your reprex in order to be explicit:

# dbt_project.yml

# Turn off quoting (which is the default for Snowflake)
quoting:
  database: false
  schema: false
  identifier: false

In some local testing, it looked like updating this line to using Jinja's trim() filter could be a fix for the issue:

          database=model.database|trim,

However, I'm guessing that there is some other whitespace trimming that is happening for models but not for snapshots in our Python code somewhere. So rather than my fix above, we'd probably opt for that trimming the snapshot target_database value in Python instead.