calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.01k stars 123 forks source link

Fix postgres__regexp_instr not validating regex #249 #250

Closed lookslikeitsnot closed 1 year ago

lookslikeitsnot commented 1 year ago

What does this PR do?

Fix regex match succeeding for any pattern in Postgres (#249) Add expected-to-fail tests for regex (#207)

Change description

Coalesce array_length and 0 to avoid null in expression when no match exists resulting in always falsy WHERE condition on error check in tests.

Type of change

Where has this been tested?

clausherther commented 1 year ago

Thanks for the PR! I'll look at this in more detail tomorrow, but looks like failed a test on BigQuery: https://app.circleci.com/pipelines/github/calogica/dbt-expectations/229/workflows/09eac6bd-ccaf-46b2-af4a-e2fe299a5fad/jobs/143?invite=true#step-106-354

clausherther commented 1 year ago

I think this (negative) test fails on BigQuery

- dbt_expectations.expect_column_values_to_not_match_regex_list:
              regex_list: ["[A-Z]", "[0-9]"]
              flags: i
              match_on: all
              config:
                error_if: "=0"
                warn_if: "<4"

because this expression returns true for all rows

select 
  regexp_instr(email_address, '[A-Z]', 1, 1) = 0
  and 
  regexp_instr(email_address, '[0-9]', 1, 1) = 0
  as expression
from dbt_expectations_integration_tests.data_text
lookslikeitsnot commented 1 year ago

Sorry about that, I missed the "The flags option is not supported for BigQuery and is being ignored." comment and forgot to add enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift' ] }}" to new tests with flags.

This leads me to another question (out of scope of this ticket but might be nice to have): since BigQuery uses re2 and re2 supports flags by prepending the regex with (?flags:re), could we implement that?

clausherther commented 1 year ago

This leads me to another question (out of scope of this ticket but might be nice to have): since BigQuery uses re2 and re2 supports flags by prepending the regex with (?flags:re), could we implement that?

Sounds good! I'm not super hip re: Regex options and flags, but if that'd be useful, I'd say go ahead and open an issue for it and we'll go from there. Thanks!