New Source Freshness test -- how to exclude static tables that are sourced from a database

dbt-labs / dbt-project-evaluator

This package contains macros and models to find DAG issues automatically

https://dbt-labs.github.io/dbt-project-evaluator/latest/

Apache License 2.0

440 stars 64 forks source link

New Source Freshness test -- how to exclude static tables that are sourced from a database #490

Closed tuday2 closed 1 month ago

tuday2 commented 1 month ago

Describe the bug

I recently updated to 14.x and while I use source freshness for most of my tables, I have a table that is sourced from a database that has static data. How do I exclude this table from the source freshness test? I've tried to add it to my dbt_project_evaluator_exceptions seed however it doesn't seem to work.

Steps to reproduce

N/A

Expected results

N/A

Actual results

N/A

Screenshots and log output

N/A

System information

  - package: dbt-labs/dbt_utils
    version: 1.3.0
  - package: dbt-labs/dbt_project_evaluator
    version: 0.14.1

Which database are you using dbt with?

[ ] postgres
[ ] redshift
[ ] bigquery
[X] snowflake
[ ] trino/starburst
[ ] other (specify: ____)

The output of dbt --version:

Versionless (Cloud)

b-per commented 1 month ago

Hi!

Using the dbt_project_evaluator_exceptions seed should work like for other tests.

could you confirm that the seed has been loaded before running the source freshness test?
if so, could you please share
1. the output of your fct_sources_without_freshness table after the test failure?
2. the seed configuration for that test

tuday2 commented 1 month ago

I am using the seed for 3 other tests that are working correctly so I am certain its loading in the correct order:

Seed:

fct_name,column_name,id_to_exclude,comment
fct_root_models,child,dim_date,Date dimensions generated with dbt_utils date_spine
fct_root_models,child,dim_date_iso,Date dimensions generated with dbt_utils date_spine
fct_multiple_sources_joined,child,stg_%_unioned,Models called _unioned can union multiple sources
fct_sources_without_freshness,child,stg_XXX_historical_calls,Exclude static table that has legacy calls for XXX

(I also tried using the source name "legacy.XXX_historical_calls" in place of the staging table name and that gave the same error)

Output from FCT_SOURCES_WITHOUT_FRESHNESS table (1 row): legacy.XXX_HISTORICAL_CALLS

b-per commented 1 month ago

What about

fct_name,column_name,id_to_exclude,comment
...
fct_sources_without_freshness,resource_name,legacy.XXX_HISTORICAL_CALLS,Exclude static table that has legacy calls for XXX

The column name in the fct is not child for this table, it is resource_name

tuday2 commented 1 month ago

That did it! thank you.

I had to use the exact case -- which normally I don't since I'm using Snowflake, however this is a reference to a source so that makes sense.

Is "resource_name" in the documentation somewhere? All the examples show "child" -- it may help to add a few more examples in the docs.

fct_sources_without_freshness,resource_name,legacy.XXX_HISTORICAL_CALLS,Exclude static table that has legacy calls for XXX

b-per commented 1 month ago

resource_name itself is not documented as it would depend on each table, but we have this in the docs

column_name: the column name from fct_name we will be looking at to define exceptions

I might add another example in the docs where the column is not called child

tuday2 commented 1 month ago

What do you mean it would depend on each table? i literally put "resource_name" in there

b-per commented 1 month ago

I mean that to add exceptions for the table fct_sources_without_freshness, there is only one filter, resource_name.

But any column returned from any of the fct_ tables can be used to set exceptions.

Taking fct_exposures_dependent_on_private_models for example, we return the columns exposure_name, parent_resource_name, parent_access and parent_resource_type

Any of those columns can be used in the seed if we wanted to exclude specific rows.

Does it make sense?

tuday2 commented 1 month ago

Wow ok -- I never made the correlation between that and the name of the column in the evaluator table -- all makes sense now.

I honestly thought child was some default for a fact table --- just never clicked. I see now the name of the column in the evaluator tables is what you reference.

All good -- thanks again for the quick help!