dbt-labs / dbt-project-evaluator

This package contains macros and models to find DAG issues automatically
https://dbt-labs.github.io/dbt-project-evaluator/latest/
Apache License 2.0
440 stars 64 forks source link

Snowflake extract function arguments are incorrectly flagged as hard coded references #410

Closed deanmorin closed 3 months ago

deanmorin commented 9 months ago

Describe the bug

The extract function is snowflake includes the keyword from. When the date_or_time_expr arg following this keyword is qualified with the table name and the closing parenthesis is on a new line, it is incorrectly flagged as a hardcoded reference.

Steps to reproduce

select
  1
  , extract(year from my_model.date_1)
  , extract(year from my_model.date_2
  )
  , extract(year from date_3
  )
from 
  {{ ref('my_model') }}

Only my_model.date_2 will be incorrectly flagged.

Expected results

No hard coded references found.

Actual results

my_model.date_2 is flagged as a hard coded reference.

Screenshots and log output

System information

The contents of your packages.yml file:

Which database are you using dbt with?

The output of dbt --version:

Core:
  - installed: 1.7.3
  - latest:    1.7.4 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.7.1 - Up to date!

Additional context

Are you interested in contributing the fix?

Unfortunately I do not have time.

graciegoheen commented 9 months ago

Explanation

This is getting caught by from_table_1 regex matching.

matches (from or join) followed by some spaces and then <something>.<something_else> 
where each <something> is enclosed by (` or [ or " or ' or nothing)

The reason my_model.date_2 is only being caught when the closing parenthesis is on a newline, is because the regex matching for from_table_1 ends with "followed by a whitespace character or end of string":

# eighth matching group
# 1 or 0 of (closing bracket, backtick, or quotation mark) followed by a whitespace character or end of string
([\]`\"\']?)(?=\s|$)

Troubleshooting

Did a bit of troubleshooting on this, and wanted to document a few solutions that don't work.

To reproduce this error, I updated the SQL in stg_model_4 to:

select 
    1 as id
    -- ,extract(year from stg_model_2.date_field
    -- ) as year
-- from {{ ref('stg_model_2') }}
  1. I tried to adjust the end of from_table_1 to instead be "not followed by a period"

    # eighth matching group
    # 1 or 0 of (closing bracket, backtick, or quotation mark) followed by a whitespace character or end of string
    ([\]`\"\']?)(?!\.)

    This didn't work because then, things like "my_db"."my_schema"."my_table" is being incorrectly flagged as "my_db"."my_schema.

  2. I tried to use a negative look behind function to say "exclude any statements following extract(...

    (?<!extract\([^\)]*)

    This didn't work because look behinds need to have a fixed width. Screenshot 2024-01-04 at 2 15 55 PM

I will keep thinking on this, but wanted to document some initial challenges. Open to other ideas for how we can prevent this from being incorrectly flagged!

github-actions[bot] commented 3 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 3 months ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.