calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.05k stars 128 forks source link

Expect_Column_Values_To_Match_Regex Test is Failing with an argument error #274

Open brian-custer opened 1 year ago

brian-custer commented 1 year ago

Is this a new bug in dbt-expectations?

-I believe this is a new bug

I have defined the test using the following code in my models yaml file:

Expected Behavior

I expect the test to work.

Steps To Reproduce

Configure your test like the above in a models yaml file.

Relevant log output

The log output is: Error executing test: regexp_instr requires 2 arguments but 4 were given.

Environment

The environment is vs code and dbt core.

- OS: Windows 11
- Python: 3.10
- dbt: dbt-core 1.5.4
- adapter: dbt-databricks 1.5.5
- dbt-expectations:

Which database adapter are you using with dbt?

dbt-databricks 1.5.5 Note: dbt-expectations currently does not support database adapters other than the ones listed below.

Additional Context

I am using the shim dbt-sparkutils to compensate for the fact that expectations doesn't run in databricks.

clausherther commented 1 year ago

Hi @brian-custer! Doesn't look like dbt-sparkutils overrides any dbt-expectations macros, and the one that's causing your issue is dbt_expectations.regexp_instr, which expects 4 parameters for the default implementation. We don't have a spark implementation for this at the moment since we don't have a CI/CD environment for spark set up. Best bet at the moment is to add a shim for dbt_expectations to dbt-sparkutils.

brian-custer commented 1 year ago

I've done that and it is still failing with the error I gave you. Any ideas how we can coax the test into working?

clausherther commented 1 year ago

Sorry, not sure I'm following. What exactly have you already done?

brian-custer commented 1 year ago

I stated what I had done in the issue. I have configured the test as shown in the issue and it errors out with the error regarding too many arguments for the regexp_instr function. I've inspected the compiled test and sure enough it is putting two 1's in addition to the column expression and the regex. Databricks is throwing the exception.

Thanks,

Brian Custer 206-661-2674


From: Claus Herther @.> Sent: Wednesday, August 16, 2023 1:05 PM To: calogica/dbt-expectations @.> Cc: Brian Custer @.>; Mention @.> Subject: EXTERNAL - Re: [calogica/dbt-expectations] Expect_Column_Values_To_Match_Regex Test is Failing with an argument error (Issue #274)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Sorry, not sure I'm following. What exactly have you already done?

— Reply to this email directly, view it on GitHubhttps://github.com/calogica/dbt-expectations/issues/274#issuecomment-1681201726, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3FCQ5BILRGUY3GWQSBRYFLXVURXBANCNFSM6AAAAAA3SVCFYA. You are receiving this because you were mentioned.Message ID: @.***>

clausherther commented 1 year ago

Right, your issue is that the dbt-sparkutils package does nothing to help you run dbt-expectations on databricks since it doesn't implement any shims for it. Unless you or someone adds spark support for regexp_instr to dbt-sparkutils, you're going continue getting this error. Your other option is to implement the shim locally in your project.

brian-custer commented 1 year ago

That's not the impression i got when I installed it in my project. It explicitly said to install the spark_utils package which would shim the expectations package. It did not say anything about coding this myself. I've had good luck running other expectations tests so I'm surprised that this one fails.

Thanks,

Brian Custer 206-661-2674


From: Claus Herther @.> Sent: Wednesday, August 16, 2023 2:46 PM To: calogica/dbt-expectations @.> Cc: Brian Custer @.>; Mention @.> Subject: EXTERNAL - Re: [calogica/dbt-expectations] Expect_Column_Values_To_Match_Regex Test is Failing with an argument error (Issue #274)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Right, your issue is that the dbt-sparkutils package does nothing to help you run dbt-expectations on databricks since it doesn't implement any shims for it. Unless you or someone adds spark support for regexp_instr to dbt-sparkutils, you're going continue getting this error. Your other option is to implement the shim locally in your project.

— Reply to this email directly, view it on GitHubhttps://github.com/calogica/dbt-expectations/issues/274#issuecomment-1681311915, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3FCQ5ED6NA6ZHLKCLVAIOLXVU5U3ANCNFSM6AAAAAA3SVCFYA. You are receiving this because you were mentioned.Message ID: @.***>

clausherther commented 1 year ago

We actually removed the reference to spark-utils in the README when we deprecated support for dbt-utils back in Nov '22 (#217 https://github.com/calogica/dbt-expectations/pull/217/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L44)

bry890 commented 1 year ago

Hi y'all

I had the same issue with the dbt_expectations.expect_column_values_to_match_regex test on Databricks. As @clausherther mentioned the problem seems to be that Databricks regexp_instr function only accepts two arguments, whereas the default is passing in four.

As a quick fix, I added the following macro in my project:


-- myproject/macros/databricks__regexp_instr.sql

{% macro databricks__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
    -- Put your Databricks-compatible regexp_instr call here
    -- This is just an example; you'll need to modify it based on your needs and if your regexp is raw or not
    -- https://docs.databricks.com/en/sql/language-manual/functions/regexp_instr.html
    -- https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html
    regexp_instr({{ source_value }}, '{{ regexp }}')
{% endmacro %}
brian-custer commented 1 year ago

Thanks for the info. I'll do that and see if I can get it to work.

Thanks,

Brian Custer 206-661-2674


From: bry890 @.> Sent: Thursday, August 17, 2023 1:27 PM To: calogica/dbt-expectations @.> Cc: Brian Custer @.>; Mention @.> Subject: EXTERNAL - Re: [calogica/dbt-expectations] Expect_Column_Values_To_Match_Regex Test is Failing with an argument error (Issue #274)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi y'all

I had the same issue with the dbt_expectations.expect_column_values_to_match_regex test on Databricks. As @claushertherhttps://github.com/clausherther mentioned the problem seems to be that Databricks regexp_instr function only accepts two arguments, whereas the default is passing in four.

As a quick fix, I added the following macro in my project:

-- myproject/macros/databricks__regexp_instr.sql

{% macro databricks__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %} -- Put your Databricks-compatible regexp_instr call here -- This is just an example; you'll need to modify it based on your needs and if your regexp is raw or not -- https://docs.databricks.com/en/sql/language-manual/functions/regexp_instr.html -- https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html regexp_instr({{ source_value }}, '{{ regexp }}') {% endmacro %}

— Reply to this email directly, view it on GitHubhttps://github.com/calogica/dbt-expectations/issues/274#issuecomment-1682928145, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3FCQ5B2X73HEVIIHYAA7XDXVZ5B5ANCNFSM6AAAAAA3SVCFYA. You are receiving this because you were mentioned.Message ID: @.***>

clausherther commented 1 year ago

FYI, support for Spark in dbt-date released today, working on Spark support for dbt-expectations. See https://getdbt.slack.com/archives/CU4MRJ7QB/p1692723790034329.

clausherther commented 1 year ago

If anyone has experience with Regex parsing in dbt-spark, I'd appreciate the assist here: https://getdbt.slack.com/archives/CNGCW8HKL/p1692733472369839

brian-custer commented 1 year ago

Thanks, good to know. I'll keep an eye out for an update.

Thanks,

Brian Custer 206-661-2674


From: Claus Herther @.> Sent: Tuesday, August 22, 2023 12:51 PM To: calogica/dbt-expectations @.> Cc: Brian Custer @.>; Mention @.> Subject: EXTERNAL - Re: [calogica/dbt-expectations] Expect_Column_Values_To_Match_Regex Test is Failing with an argument error (Issue #274)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

FYI, support for Spark in dbt-date released today, working on Spark support for dbt-expectations. See https://getdbt.slack.com/archives/CU4MRJ7QB/p1692723790034329.

— Reply to this email directly, view it on GitHubhttps://github.com/calogica/dbt-expectations/issues/274#issuecomment-1688833508, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3FCQ5CWTSWMNNPITDGMJQTXWUEUTANCNFSM6AAAAAA3SVCFYA. You are receiving this because you were mentioned.Message ID: @.***>