dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
296 stars 176 forks source link

[Bug] During `persist_docs`, col names in the schema yml file should always be uppercased if we don't have a `quote` config on the column #1156

Open jeremyyeo opened 3 months ago

jeremyyeo commented 3 months ago

Is this a new bug in dbt-snowflake?

Current Behavior

If users have columns.name key in a mixed case - dbt does not correctly match the uppercased column name that comes from a describe table ... with what is in the schema.yml file - therefor, column descriptions aren't added to the column.

Expected Behavior

During the "matching process" - we should always uppercase the columns.name key no matter the case if the user doesn't specify the quote config on the column (https://docs.getdbt.com/reference/resource-properties/quote).

Steps To Reproduce

Project setup:

# dbt_project.yml
name: my_dbt_project
profile: all
version: "1.0.0"

models:
   my_dbt_project:
      +materialized: table

# models/schema.yml
models:
  - name: foo
    columns:
      - name: mixed_Case
        description: Col description
-- models/foo.sql
{{ config(persist_docs={"columns": true}) }}
select 1 as mixed_Case

Run

$ dbt --debug run
22:05:24  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
22:05:24  Re-using an available connection from the pool (formerly list_development_jyeo_dbt_jyeo, now model.my_dbt_project.foo)
22:05:24  Began compiling node model.my_dbt_project.foo
22:05:24  Writing injected SQL for node "model.my_dbt_project.foo"
22:05:24  Began executing node model.my_dbt_project.foo
22:05:24  Writing runtime sql for node "model.my_dbt_project.foo"
22:05:24  Using snowflake connection "model.my_dbt_project.foo"
22:05:24  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as mixed_Case
        );
22:05:24  Opening a new connection, currently in state closed
22:05:26  SQL status: SUCCESS 1 in 2.0 seconds
22:05:26  Using snowflake connection "model.my_dbt_project.foo"
22:05:26  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
22:05:26  SQL status: SUCCESS 1 in 0.0 seconds
22:05:26  Using snowflake connection "model.my_dbt_project.foo"
22:05:26  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter

        "MIXED_CASE" COMMENT $$$$;
22:05:26  SQL status: SUCCESS 1 in 0.0 seconds
22:05:26  On model.my_dbt_project.foo: Close
22:05:27  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '102c02e3-7b4f-48a1-91be-ac0bb00f1129', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x13f491f50>]}
22:05:27  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.83s]

^ What happens here....

  1. We created the table column as select 1 as mixed_Case - and what happens is Snowflake has this column as a uppercased MIXED_CASE (this is the default behaviour of Snowflake).
  2. We run a describe table and the describe has returned that a column MIXED_CASE exist.
  3. This uppercase MIXED_CASE that comes from the describe, does not match what is in the schema.yml which has that as mixed_Case.
  4. "MIXED_CASE" != "mixed_Case" so we don't find a matching column description.
  5. Because there is no matching column description then the comment added is empty.

What can users do about this

(A) They can quote the column name in the model

-- models/foo.sql
{{ config(persist_docs={"columns": true}) }}
select 1 as "mixed_Case"
# models/schema.yml
models:
  - name: foo
    columns:
      - name: mixed_Case
        description: Col description
$ dbt --debug run

22:10:57  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as "mixed_Case"
        );
22:10:57  Opening a new connection, currently in state closed
22:10:59  SQL status: SUCCESS 1 in 2.0 seconds
22:10:59  Using snowflake connection "model.my_dbt_project.foo"
22:10:59  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
22:10:59  SQL status: SUCCESS 1 in 0.0 seconds
22:10:59  Using snowflake connection "model.my_dbt_project.foo"
22:10:59  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter

        "mixed_Case" COMMENT $$Col description$$;
22:10:59  SQL status: SUCCESS 1 in 0.0 seconds

Here Snowflake has created a literal mixed case column name mixed_Case - the describe returns this exactly and we get a match in the schema.yml file - "mixed_Case" = "mixed_Case" - therefor, we can apply the column description appropriately.

Cons

Their table will have a literal mixed case column name which they may not want.

(B) They can uppercase / lowercase the column name in the schema yaml

-- models/foo.sql
{{ config(persist_docs={"columns": true}) }}
select 1 as mixed_Case
# models/schema.yml
models:
  - name: foo
    columns:
      - name: mixed_case # This can also be 'MIXED_CASE' (all uppercase) - the net effect is the same.
        description: Col description
$ dbt --debug run
22:15:28  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as mixed_Case
        );
22:15:28  Opening a new connection, currently in state closed
22:15:30  SQL status: SUCCESS 1 in 2.0 seconds
22:15:30  Using snowflake connection "model.my_dbt_project.foo"
22:15:30  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
22:15:30  SQL status: SUCCESS 1 in 0.0 seconds
22:15:30  Using snowflake connection "model.my_dbt_project.foo"
22:15:30  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter

        "MIXED_CASE" COMMENT $$Col description$$;
22:15:31  SQL status: SUCCESS 1 in 0.0 seconds

For some reason, if the columns.name key is either:

dbt has some internal methods to match this correctly.

Cons

User have to have a different column casing style between their model code and their schema yml files.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11
- dbt-core: 1.8.4
- dbt-snowflake: 1.8.3

Additional Context

If users don't specify the quote column config (https://docs.getdbt.com/reference/resource-properties/quote) - I propose we always uppercase the columns.name key like (B) above - even if the user is going to specify a mixed case column name in the schema yml file.

jeremyyeo commented 3 months ago

https://github.com/dbt-labs/dbt-snowflake/blob/7fb4549abc52d53dfdbf02da423ee58cc8696ccc/dbt/include/snowflake/macros/adapters.sql#L2-L7

^ Ah ok so here is why Case (B) works... we both upper and lower the column names and match it in the column dict... therefore always lowercased or UPPERCASED would match something for sure....


In the exact scenario above (mixed_Case in both model and yaml and not-quoted)... we can override the built in macro:

-- macros/get_column_comment_sql.sql
{% macro get_column_comment_sql(column_name, column_dict) -%}
  {#/* Create new uppercased key column dict. */#}
  {% set fixed_case_column_dict = {} %}
  {% for k, v in column_dict.items() %}
    {% do fixed_case_column_dict.update({k.upper(): v}) %}
  {% endfor %}

  {% if (column_name in fixed_case_column_dict) %}
    {% set matched_column = column_name -%}
  {% else %}
    {% set matched_column = None -%}
  {% endif %}

  {% if matched_column -%}
    {{ adapter.quote(column_name) }} COMMENT $${{ fixed_case_column_dict[matched_column]['description'] | replace('$', '[$]') }}$$
  {%- else -%}
    {{ adapter.quote(column_name) }} COMMENT $$$$
  {%- endif -%}
{% endmacro %}

Then:

$ dbt --debug run
01:31:36  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as mixed_Case
        );
01:31:36  Opening a new connection, currently in state closed
01:31:37  SQL status: SUCCESS 1 in 2.0 seconds
01:31:38  Using snowflake connection "model.my_dbt_project.foo"
01:31:38  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
01:31:38  SQL status: SUCCESS 1 in 0.0 seconds
01:31:38  Using snowflake connection "model.my_dbt_project.foo"
01:31:38  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter

  "MIXED_CASE" COMMENT $$Col description$$;
01:31:38  SQL status: SUCCESS 1 in 0.0 seconds
01:31:38  On model.my_dbt_project.foo: Close
01:31:38  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'e30c8d1b-5b26-4c3e-bc87-82e19958136a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x13938f0d0>]}
01:31:38  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.78s]