dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
296 stars 176 forks source link

[Regression] [Bug] Incorrect column order in view after successful job run #1180

Open crystalro0 opened 2 months ago

crystalro0 commented 2 months ago

Is this a new bug in dbt-snowflake?

Current Behavior

hi all! I have a customer that is experiencing incorrect column order in their Cloud jobs which is causing errors. They've reported to Snowflake and they in turn identified a possible issue with the way we create the view/table. Currently, they're seeing a SELECT * during the create or replace.

Seeing {% set model_columns = model.columns %} here: https://github.com/dbt-labs/dbt-snowflake/blob/3fbc0749491f40d34014336457753b140e1fb1ee/dbt/include/snowflake/macros/relations/view/create.sql#L13

They're on versionless and on Snowflake. I've been unable to reproduce the issue with a similar source table and ref'd model.

Expected Behavior

Columns in order as per the table/view structure or query updates.

Steps To Reproduce

  1. Create source table with 10+ columns.
  2. CTE A to ref the source and CTE B transform in int model to have SELECT * FROM CTE A returned and excluding one of the columns.
  3. add CTE C to select * from CTE B with 2 new columns appended.
  4. add final select * from CTE C.

Relevant log output

No response

Environment

- dbt-core: Versionless
- dbt-snowflake: 1.9.0x

Additional Context

Needs refinement - may be generic dbt adapters issue.

amychen1776 commented 2 months ago

@crystalro0 just so I fully understand the issue, basically what is happening is the order in which the columns are displayed are not in the order in which they are declared from the initial CTE A (as is carried into CTE B --> CTE C + 2 new columns)?

Did they just start to see this happening on versionless? Also what is the impact of this on their project?

crystalro0 commented 2 months ago

@amychen1776 correct, then their column values get mixed up. Yes, on versionless. The impact is the wrong data in the wrongly positioned column, so their downstream model fails due to the switch of type of value. I'm not sure on the actual percentage, but I would say there are intermittent job failures with that downstream model. As a quick aside, the user has created a feature request with Snowflake as well. *Edit: I do have their compiled queries if needed.

amychen1776 commented 2 months ago

@crystalro0 apologies - I might be missing something; what do you mean by wrong data in the wrong column?

crystalro0 commented 2 months ago

@amychen1776 no worries! that downstream column is expecting a numeric datatype and ends up with a text field. also including their description from the ticket:

While creating the model in question, dbt is generating changing column order in the create or replace view clause where all columns are listed. This is making the model contain wrong column data for wrong column name. This further makes the downstream model build fail where a numeric column has suddenly text data in it. This leads to job failures or successes depending on whether the column name and its data are correct or wrong.

amychen1776 commented 1 month ago

Thank you! That's super helpful

colin-rogers-dbt commented 1 month ago

@crystalro0 what version did their project work with? Does this behavior happen in 1.7 as well?

colin-rogers-dbt commented 1 month ago

Likely culprit is that the call we make to retrieve the columns is non-deterministic in the order of the columns or that we are changing the sort after the fact, we need to look at preserving a given order.

crystalro0 commented 3 weeks ago

@crystalro0 what version did their project work with? Does this behavior happen in 1.7 as well?

sorry @colin-rogers-dbt I missed this! I believe it was working with 1.7 before the move to versionless.