dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.68k stars 1.61k forks source link

[Feature] Raise an error for unit tests that don't contain all filtering columns #10649

Closed leonardobocci closed 3 weeks ago

leonardobocci commented 4 weeks ago

Is this a new bug in dbt-core?

Current Behavior

Fixtures that are missing required columns in models lead to a hard-to-debug silent failure in the unit test.

Expected Behavior

When a dbt model requires a filtering column, unit tests should raise an error when fixtures don't contain the required columns.

Steps To Reproduce

  1. Model:
    
    with
    foo as (
    select col1,col2 from {{ ref('foo') }} where filtercol is not null
    )

bar as ( Transformations here )

select * from bar

2. Unit test:

unit_tests:

Relevant log output

Note: only expected output is expected, because due to the missing filtering column in the input fixture, no rows were selected.

15:38:29  Finished running 1 unit test in 0 hours 0 minutes and 1.46 seconds (1.46s).
15:38:29  Completed with 1 error and 0 warnings:
15:38:29  Failure in unit_test foobar (models/unit_tests.yaml)
15:38:29    
actual differs from expected:
@@ ,col1,col2
---,1,   2
---,3,   4

Environment

- OS: Ubuntu 22.04
- Python: 3.11
- dbt: 1.8.6

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

Adapter used: Clickhouse

dbeatty10 commented 3 weeks ago

Thanks for raising this @leonardobocci !

I switched this from a bug report to a feature request since dbt is behaving the way we expect. Namely, for any columns that are not specified in a unit testing fixture, dbt takes that to mean the value doesn't matter and it will supply an arbitrary value to the transformation model. The default arbitrary value is null, but some adapters may choose to override this default. If any value is relevant to your expected output (like filtercol in your example!), then you'll need to supply it. dbt doesn't know which columns are important or not (and my initial guess is that it can't know).

Either way, we aren't planning to do anything fancier in this regard than we're doing currently, so I'm going to close this as "not planned".