dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.96k stars 1.63k forks source link

[Bug] Incremental table allows invalid SQL #10827

Closed jeremy-thomas-roc closed 1 month ago

jeremy-thomas-roc commented 1 month ago

Is this a new bug in dbt-core?

Current Behavior

I have a query in an incremental table that I am changing to now materialize as a view. When I update the materialization, the model no longer builds successfully, and throws an error about my SQL being invalid. Building the model as incremental either with or without full-refresh works with no issues, and in fact, the column that is part of the invalid SQL appears to be in working order in the incremental table. The SQL also throws the same error if I try to change the materialization to table, so it isn't a difference between views and tables.

Expected Behavior

Materializations should compile and run consistently across all types (within the boundaries of the platform)

Steps To Reproduce

{{ config(materialized="incremental", ...) }}

select
    c.customer_id,
    count(distinct order_id) as n_orders
from {{ ref("customers") }} c
left join {{ ref("orders") }} o using (customer_id)
group by c.customer_id, n_orders -- this is invalid, grouping by the aggregate

this compiles, runs, and n_orders is correct in the resulting table.

{{ config(materialized="view") }}

select
    c.customer_id,
    count(distinct order_id) as n_orders
from {{ ref("customers") }} c
left join {{ ref("orders") }} o using (customer_id)
group by c.customer_id, n_orders 

this correctly throws the error [N_ORDERS] is not a valid group by expression

Relevant log output

No response

Environment

- OS: multiple: Ubunutu 22.04, Mac OS
- Python: 3.10.13
- dbt: 1.7.4

Which database adapter are you using with dbt?

snowflake

Additional Context

I'm not ashamed to say that I am entirely baffled by this behavior. 4 different additional people on my team that I would consider to be more than proficient in SQL looked at this with me, and none of us could begin to explain how or why this is happening. The invalid code was living and working in production for multiple months, incrementing hourly successfully, and we only found it when I went to change the materialization due to an unrelated issue. Even if this is not strictly a dbt issue, if anyone has any insight into how this is happening, it would be greatly appreciated.

dbeatty10 commented 1 month ago

Thanks for reporting this @jeremy-thomas-roc !

I tried to replicate this, but I wasn't able: I got a [N_ORDERS] is not a valid group by expression error no matter the materialization ("incremental" vs. "view").

Out of curiosity, what does it do if you add the --no-partial-parse flag to the end of your command?

jeremy-thomas-roc commented 1 month ago

@dbeatty10 I'll be honest, this was the exact response I was expecting, because it's such an unusual error.

I created the minimal reproducible example for illustrative purposes, but I will tinker with the real code and test that flag out, and come back with some more substantive troubleshooting within the next couple of days.

jeremy-thomas-roc commented 1 month ago

@dbeatty10 I'll be completely honest, I spent an hour copying bits of code into another file, including a full copy of the offending file, and I couldn't recreate either. I would simply blame odd caching behavior if I could only make this happen locally, but it happened in cloud hosted pipelines, so I'm still at a loss. However, without being able to recreate, I'll close this for now and if it ever reappears, I'll re-open and we can revisit.

Thanks for taking the time to test this!