dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
291 stars 175 forks source link

[Bug] Defaulting `reuse_connections` to `True` is breaking some users jobs #1167

Open jeremyyeo opened 3 months ago

jeremyyeo commented 3 months ago

Is this a new bug in dbt-snowflake?

Current Behavior

We made connection reuse the default in #1109 - this is breaking some users jobs when they depend on thread session isolation. We should make reuse_connections opt-in instead of opt-out.

Expected Behavior

Make this behave the way they were prior to making reuse_connections: True by default.

Steps To Reproduce

First, let's create a UDF to return the current session timezone - we will be using this in our model code later:

CREATE OR REPLACE FUNCTION development_jyeo.dbt_jyeo.get_current_timezone()
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS 
$$
  const timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
  return timezone;
$$
;

select development_jyeo.dbt_jyeo.get_current_timezone();
-- America/Los_Angeles

Make sure the account level timezone is America/Los_Angeles (Snowflakes default):

image

Let's setup our profiles.yml:

# .dbt/profiles.yml
sf:
  target: dev
  outputs:
    dev: 
      type: snowflake
      account: ...
      user: ...
      password: ...
      role: transformer
      database: development_jyeo
      schema: dbt_jyeo
      warehouse: analytics

^ Key here being that neither of these keys client_session_keep_alive, reuse_connections are set by the end user (dbt Cloud default profile).

Now, let's setup our dbt project.

# dbt_project.yml
name: my_dbt_project
profile: sf
config-version: 2
version: "1.0.0"

models:
   my_dbt_project:
      +materialized: table
      +pre-hook: "{{ check_tz() }}"
      +post-hook: "{{ check_tz() }}"

on-run-start: "{{ check_tz() }}"
on-run-end: "{{ check_tz() }}"

-- models/schema.yml
models:
  - name: a
    columns:
      - name: ctz
        data_tests:
          - accepted_values:
              values: ['Pacific/Auckland']
  - name: b
    columns:
      - name: ctz
        data_tests:
          - accepted_values:
              values: ['America/Los_Angeles']
-- macros/check_tz.sql
{% macro check_tz() %}
    {% set query %}
        SHOW PARAMETERS LIKE 'TIMEZONE' IN SESSION;
    {% endset %}
    {% if execute %}
        {% set results = run_query(query).columns[1].values()[0] %}
        {% do print('=========================================') %}
        {% do print('SESSION TIMEZONE IS: ' ~ results) %}
        {% do print('=========================================') %}
    {% endif %}
{% endmacro %}

-- models/a.sql
{{ config(sql_header="ALTER SESSION SET TIMEZONE = 'Pacific/Auckland';") }}
select current_timestamp()::text as c, development_jyeo.dbt_jyeo.get_current_timezone() as ctz

-- models/b.sql
select current_timestamp()::text as c, development_jyeo.dbt_jyeo.get_current_timezone() as ctz

The idea here is that only model a.sql should operate on timezone Pacific/Auckland - and model b.sql should follow the account timezone America/Los_Angeles.

First we build using 1.8.latest:

$ dbt build --threads 1
00:36:05  Running with dbt=1.8.5
00:36:05  Registered adapter: snowflake=1.8.3
00:36:06  Found 2 operations, 2 models, 2 data tests, 449 macros
00:36:06  
00:36:11  
00:36:11  Running 1 on-run-start hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:36:13  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
00:36:13  1 of 1 OK hook: my_dbt_project.on-run-start.0 .................................. [OK in 0.00s]
00:36:13  
00:36:14  Concurrency: 1 threads (target='sf')
00:36:14  
00:36:14  1 of 4 START sql table model dbt_jyeo.a ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
00:36:18  1 of 4 OK created sql table model dbt_jyeo.a ................................... [SUCCESS 1 in 4.24s]
00:36:18  2 of 4 START sql table model dbt_jyeo.b ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:36:22  2 of 4 OK created sql table model dbt_jyeo.b ................................... [SUCCESS 1 in 3.66s]
00:36:22  3 of 4 START test accepted_values_a_ctz__Pacific_Auckland ...................... [RUN]
00:36:24  3 of 4 PASS accepted_values_a_ctz__Pacific_Auckland ............................ [PASS in 2.61s]
00:36:24  4 of 4 START test accepted_values_b_ctz__America_Los_Angeles ................... [RUN]
00:36:27  4 of 4 PASS accepted_values_b_ctz__America_Los_Angeles ......................... [PASS in 2.48s]
00:36:27  
00:36:27  Running 1 on-run-end hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:36:28  1 of 1 START hook: my_dbt_project.on-run-end.0 ................................. [RUN]
00:36:28  1 of 1 OK hook: my_dbt_project.on-run-end.0 .................................... [OK in 0.00s]
00:36:28  
00:36:29  
00:36:29  Finished running 2 table models, 2 data tests, 2 project hooks in 0 hours 0 minutes and 23.22 seconds (23.22s).
00:36:29  
00:36:29  Completed successfully
00:36:29  
00:36:29  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

^ So far so good. The session that built model b was independent of the session that built model a - therefor changing the timezone on the session that built model a did not affect model b.

image

Now, we do a local install of dbt-snowflake from the main branch:

$ git rev-parse HEAD
3fbc0749491f40d34014336457753b140e1fb1ee

^ 3fbc0749491f40d34014336457753b140e1fb1ee is ahead of commit 6857e6b800329a6490c3339824249c18b1af5eda - which contains the commit that made reuse_connections True by default.

$ dbt build --threads 1
00:45:21  Running with dbt=1.9.0-a1
00:45:22  Registered adapter: snowflake=1.9.0-a1
00:45:22  Unable to do partial parsing because of a version mismatch
00:45:22  Found 2 models, 2 operations, 2 data tests, 452 macros
00:45:22  
00:45:25  
00:45:25  Running 1 on-run-start hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:45:27  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
00:45:27  1 of 1 OK hook: my_dbt_project.on-run-start.0 .................................. [OK in 0.00s]
00:45:27  
00:45:27  Concurrency: 1 threads (target='sf')
00:45:27  
00:45:27  1 of 4 START sql table model dbt_jyeo.a ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
00:45:30  1 of 4 OK created sql table model dbt_jyeo.a ................................... [SUCCESS 1 in 2.20s]
00:45:30  2 of 4 START sql table model dbt_jyeo.b ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
00:45:31  2 of 4 OK created sql table model dbt_jyeo.b ................................... [SUCCESS 1 in 1.74s]
00:45:31  3 of 4 START test accepted_values_a_ctz__Pacific_Auckland ...................... [RUN]
00:45:32  3 of 4 PASS accepted_values_a_ctz__Pacific_Auckland ............................ [PASS in 0.56s]
00:45:32  4 of 4 START test accepted_values_b_ctz__America_Los_Angeles ................... [RUN]
00:45:32  4 of 4 FAIL 1 accepted_values_b_ctz__America_Los_Angeles ....................... [FAIL 1 in 0.60s]
00:45:32  
00:45:32  Running 1 on-run-end hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:45:33  1 of 1 START hook: my_dbt_project.on-run-end.0 ................................. [RUN]
00:45:33  1 of 1 OK hook: my_dbt_project.on-run-end.0 .................................... [OK in 0.00s]
00:45:33  
00:45:34  
00:45:34  Finished running 2 table models, 2 data tests, 2 project hooks in 0 hours 0 minutes and 11.68 seconds (11.68s).
00:45:34  
00:45:34  Completed with 1 error and 0 warnings:
00:45:34  
00:45:34  Failure in test accepted_values_b_ctz__America_Los_Angeles (models/schema.yml)
00:45:34    Got 1 result, configured to fail if != 0
00:45:34  
00:45:34    compiled code at target/compiled/my_dbt_project/models/schema.yml/accepted_values_b_ctz__America_Los_Angeles.sql
00:45:34  
00:45:34  Done. PASS=3 WARN=0 ERROR=1 SKIP=0 TOTAL=4

^ Now things don't work so well anymore.

image

image

The session is shared and causes model b to no longer use the account timezone default.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11
- dbt-core: 1.8.latest / 1.9.alpha
- dbt-snowflake: 1.8.latest / 1.9.alpha

Additional Context

In dbt Cloud, the debug logs show which commit of the adapter we are on. We have rolled forward and backward a few times this week.

As of this writing...

2024-08-13 00:04:11.218553 (MainThread): 00:04:11  Registered adapter: snowflake=1.9.0-post1+e5b28d67cf4b870da81674de3e1b84c4474a00a6

Snippet of debug logs from a dbt Cloud job run.

^ We are back on e5b28d - which is prior to commit 6857e6 (the one that introduced this default).

However, towards the end of last week, all throughout the weekend, we had been in a rolled forward state:

2024-08-11 07:31:34.545083 (MainThread): 07:31:34  Registered adapter: snowflake=1.9.0-post5+7fb4549abc52d53dfdbf02da423ee58cc8696ccc

7fb454 being ahead of 6857e6 - therefor some customers ran into this.

dbt Cloud users probably want to do:

image

In their extended attributes so that this doesn't automatically get turned on for them by default (whenever we roll forward again but have not yet fixed this issue of reuse_connections: True by default).

amychen1776 commented 2 months ago

@jeremyyeo is this annoying for Cloud users or a true bug? This seems like an enhancement request