dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.93k stars 1.63k forks source link

[Bug] unit test errors on join logic without table aliases #10068

Closed graciegoheen closed 6 months ago

graciegoheen commented 6 months ago

Is this a new bug in dbt-core?

Current Behavior

I’m trying to unit test my fct_orders model:

select
    order_id,
    customer_id,
    order_date,
    days_since_ordered,
    candy_name,
    candy_price,
    candy_type,
    case 
        when candy_type = 'bar' and is_golden_ticket_candidate then true 
        else false
    end as does_contain_golden_ticket,

from {{ ref('stg_orders') }}
left join {{ ref('stg_candies')}}
    on stg_orders.candy_id = stg_candies.candy_id

which contains join logic without table aliases.

I define the following unit test:

unit_tests:
  - name: test_does_contain_golden_ticket # this is the unique name of the test
    description: >
      An order can only contain a golden ticket if it's been selected as a golden ticket candidate
      and it's a bar.
    model: fct_orders # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_orders')
        rows:
         - {order_id: 1, candy_id: 1, is_golden_ticket_candidate: true}
         - {order_id: 2, candy_id: 2, is_golden_ticket_candidate: false}
         - {order_id: 3, candy_id: 3, is_golden_ticket_candidate: true}
         - {order_id: 4, candy_id: 4, is_golden_ticket_candidate: false}
      - input: ref('stg_candies')
        rows:
         - {candy_id: 1, candy_type: BaR} # mixed case "bar" candy_type
         - {candy_id: 2, candy_type: BaR} 
         - {candy_id: 3, candy_type: not a bar} # candy_type is something besides "bar"
         - {candy_id: 4, candy_type: not a bar} 
    expect: # the expected output given the inputs above
      rows:
        - {order_id: 1, does_contain_golden_ticket: true} # candidate and bar
        - {order_id: 2, does_contain_golden_ticket: false} # not a candidate and bar
        - {order_id: 3, does_contain_golden_ticket: false} # candidate and not a bar
        - {order_id: 4, does_contain_golden_ticket: false} # not a candidate and not a bar

I get the error:

   Database Error
    000904 (42000): SQL compilation error: error line 99 at position 7
    invalid identifier 'STG_ORDERS.CANDY_ID'     

From the SQL:

create or replace temporary table DEVELOPMENT.dbt_ggoheen.test_does_contain_golden_ticket__dbt_tmp
         as
        (select * from (
        with __dbt__cte__stg_orders as (

-- Fixture for stg_orders
select 

        try_cast('1' as NUMBER(38,0))
     as order_id, try_cast(null as NUMBER(38,0)) as customer_id, 

        try_cast('1' as NUMBER(38,0))
     as candy_id, 

        try_cast('True' as BOOLEAN)
     as is_golden_ticket_candidate, try_cast(null as DATE) as order_date, try_cast(null as NUMBER(9,0)) as days_since_ordered
union all
select 

        try_cast('2' as NUMBER(38,0))
     as order_id, try_cast(null as NUMBER(38,0)) as customer_id, 

        try_cast('2' as NUMBER(38,0))
     as candy_id, 

        try_cast('False' as BOOLEAN)
     as is_golden_ticket_candidate, try_cast(null as DATE) as order_date, try_cast(null as NUMBER(9,0)) as days_since_ordered
union all
select 

        try_cast('3' as NUMBER(38,0))
     as order_id, try_cast(null as NUMBER(38,0)) as customer_id, 

        try_cast('3' as NUMBER(38,0))
     as candy_id, 

        try_cast('True' as BOOLEAN)
     as is_golden_ticket_candidate, try_cast(null as DATE) as order_date, try_cast(null as NUMBER(9,0)) as days_since_ordered
union all
select 

        try_cast('4' as NUMBER(38,0))
     as order_id, try_cast(null as NUMBER(38,0)) as customer_id, 

        try_cast('4' as NUMBER(38,0))
     as candy_id, 

        try_cast('False' as BOOLEAN)
     as is_golden_ticket_candidate, try_cast(null as DATE) as order_date, try_cast(null as NUMBER(9,0)) as days_since_ordered
),  __dbt__cte__stg_candies as (

-- Fixture for stg_candies
select 

        try_cast('1' as NUMBER(38,0))
     as candy_id, try_cast(null as character varying(16777216)) as candy_name, 

        try_cast('BaR' as character varying(16777216))
     as candy_type, try_cast(null as NUMBER(16,2)) as candy_price
union all
select 

        try_cast('2' as NUMBER(38,0))
     as candy_id, try_cast(null as character varying(16777216)) as candy_name, 

        try_cast('BaR' as character varying(16777216))
     as candy_type, try_cast(null as NUMBER(16,2)) as candy_price
union all
select 

        try_cast('3' as NUMBER(38,0))
     as candy_id, try_cast(null as character varying(16777216)) as candy_name, 

        try_cast('not a bar' as character varying(16777216))
     as candy_type, try_cast(null as NUMBER(16,2)) as candy_price
union all
select 

        try_cast('4' as NUMBER(38,0))
     as candy_id, try_cast(null as character varying(16777216)) as candy_name, 

        try_cast('not a bar' as character varying(16777216))
     as candy_type, try_cast(null as NUMBER(16,2)) as candy_price
) select
    order_id,
    customer_id,
    order_date,
    days_since_ordered,
    candy_name,
    candy_price,
    candy_type,
    case 
        when candy_type = 'bar' and is_golden_ticket_candidate then true 
        else false
    end as does_contain_golden_ticket,

from __dbt__cte__stg_orders
left join __dbt__cte__stg_candies
    on stg_orders.candy_id = stg_candies.candy_id
    ) as __dbt_sbq
    where false
    limit 0

        );

stg_orders does have a candy_id column. but the query being generated does not reference stg_orders, it references __dbt__cte__stg_orders. so this join:

from __dbt__cte__stg_orders
left join __dbt__cte__stg_candies
    on stg_orders.candy_id = stg_candies.candy_id
    ) as __dbt_sbq

is failing.

This will work as long as I alias my table names:

select
    order_id,
    customer_id,
    order_date,
    days_since_ordered,
    candy_name,
    candy_price,
    candy_type,
    case 
        when candy_type = 'bar' and is_golden_ticket_candidate then true 
        else false
    end as does_contain_golden_ticket

from {{ ref('stg_orders') }} as stg_orders
left join {{ ref('stg_candies')}} as stg_candies
    on stg_orders.candy_id = stg_candies.candy_id

Expected Behavior

I should be able to unit test join logic without table aliases

Which database adapter are you using with dbt?

Snowflake

graciegoheen commented 6 months ago

I'm going to close this as won't do and document this as a known exception.

It is out best practice to alias table names.