[Bug] All test failures are stored even when a limit config is set

dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Apache License 2.0

9.66k stars 1.6k forks source link

Is this a new bug in dbt-core?

[X] I believe this is a new bug in dbt-core
[X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hi there, firstly thanks for the amazing job you're doing, I am huge fan of dbt !

I use dbt core with dbt-redshift.

I wrote a test on a simple seed to check the behavior of the _storefailures test config when used in addition to the limit config.

The seed I test (seeds/customers.csv) looks like this and has 100 rows (from ID 1 to 100):

ID,FIRST_NAME,LAST_NAME
1,Michael,P.
2,Shawn,M.
3,Kathleen,P.
4,Jimmy,C.
5,Katherine,R.
6,Sarah,R.
7,Martin,M.
8,Frank,R.
9,Jennifer,F.
10,Henry,W.

Download file here : customers.csv

The test definition (seeds/_seeds.yml) looks like this: TLDR : The "ID" column must be strictly lower than 50. Store the failures. But stop if more than 20 failures.

version: 2

seeds:
  - name: customers
    +quote_columns: false
    columns:
      - name: ID
        tests:
          - dbt_utils.accepted_range:
              name: my_test_with_limit_and_store_failures
              max_value: 50
              inclusive: true
              config:
                store_failures: true
                warn_if: "<5"
                error_if: ">=5"
                limit: 20

What I experience when running dbt test -s customers :

The test log seems OK
But my test audit table generated in my warehouse contains all the failed row (50) and not 20 (the limit specified).

Expected Behavior

I guess only the 20 first row must be stored in the audit table (based on the store_failure config documentation pictured here)

Steps To Reproduce

python3 -m venv .venv. 
.venv/bin/activate; pip install -r requirements.txt
dbt deps
dbt seed
dbt test -s customers

Where requirements.txt is

dbt-core==1.7.6
dbt-redshift==1.7.2

Relevant log output

No response

Environment

- OS:Ubuntu 22.04.3 LTS
- Python: 3.9.7
- dbt: 1.7.6

Which database adapter are you using with dbt?

redshift

Additional Context

I noticed this SQL file generated in the target folder if it can help

with meet_condition as( select * from "db"."feature_456"."customers" ), validation_errors as ( select * from meet_condition where -- never true, defaults to an empty result set. Exists to ensure any combo of the `or` clauses below succeeds 1 = 2 -- records with a value <= max_value are permitted. The `not` flips this to find records that don't meet the rule. or not ID <= 5 ) select * from validation_errors

select count(*) as failures, count(*) != 0 as should_warn, count(*) != 0 as should_error from ( select * from "db"."feature_456_dbt_test__audit"."my_test_with_limit_and_store_failures" limit 2 ) dbt_internal_test

dbt-labs / dbt-core