[X] I believe this is a new bug in dbt-expectations
[X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Any call to the regexp_instr macro will be valid for PostgreSQL. This is due to the generated expression and the handling of nulls by Postgres.
Dissecting the generated expression:
with grouped_expression as (
select array_length(
(
select regexp_matches(postal_code_5_3, '^\d{5}-\d{3}', '')
),
1
) > 0 as expression
from <db_name>.dbt_expectations_integration_tests.data_text
),
validation_errors as (
select *
from grouped_expression
where not(expression = true)
)
select *
from validation_errors
select regexp_matches('<valid_source>', '<valid_regex>', '<any_flags>') will always return either no results or a text[].
Getting the array_length of that will either be null or >0
Comparing that to 0 will return null or true respectively
Comparing that to true will once again return null or true respectively
Negating that will return null or false which have the same full filter effect in a where clause
Expected Behavior
regexp_instr should fail when regex doesn't match.
Steps To Reproduce
Change any regex in "dbt-expectations/integration_tests/models/schema_tests/schema.yml" with anything else (e.g. replace the '^\d{5}-\d{3}' regex with 'definitely_not_a_valid_postal_code') , the tests will still succeed.
Fix with the widest Postgres version compatibility range would probably be to coalesce the array_length
i.e. to replace
array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1)
with
coalesce(array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1), 0)
in "dbt-expectations/macros/regex/regexp_instr.sql"
Is this a new bug in dbt-expectations?
Current Behavior
Any call to the regexp_instr macro will be valid for PostgreSQL. This is due to the generated expression and the handling of nulls by Postgres.
Dissecting the generated expression:
select regexp_matches('<valid_source>', '<valid_regex>', '<any_flags>')
will always return either no results or a text[].null
or >0null
ortrue
respectivelynull
ortrue
respectivelynull
orfalse
which have the same full filter effect in a where clauseExpected Behavior
regexp_instr should fail when regex doesn't match.
Steps To Reproduce
Change any regex in "dbt-expectations/integration_tests/models/schema_tests/schema.yml" with anything else (e.g. replace the '^\d{5}-\d{3}' regex with 'definitely_not_a_valid_postal_code') , the tests will still succeed.
Environment
Which database adapter are you using with dbt?
Postgres
Suggested fix
Fix with the widest Postgres version compatibility range would probably be to coalesce the array_length i.e. to replace
array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1)
withcoalesce(array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1), 0)
in "dbt-expectations/macros/regex/regexp_instr.sql"