dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
268 stars 165 forks source link

[Bug] Source metadata freshness is using the wrong field to calculate last updated #899

Open mikealfare opened 6 months ago

mikealfare commented 6 months ago

Is this a new bug in dbt-snowflake?

Current Behavior

Per @tjirab's https://github.com/dbt-labs/dbt-snowflake/issues/785#issuecomment-1900357214:

Minor concern: LAST_ALTERED gets updated upon DML & DDL changes, and maintenance ops. It does not necessarily mean fresh data has become available.

Expected Behavior

We would expect to only reflect data updates, and not all object updates, when reporting data freshness.

Steps To Reproduce

  1. Make a change to the dynamic table config (refresh frequency, warehouse, etc.)
  2. Ensure the dependent data sources have not seen new data
  3. Observe that the data freshness is reporting newer data because it's picking up the update from 1.

Relevant log output

No response

Environment

- OS: any
- Python: any
- dbt-core: any
- dbt-snowflake: 1.6+ (dynamic table support)

Additional Context

No response

mikealfare commented 6 months ago

After discussing internally, we determined that this was a known and acceptable risk. However, we also determined that the documentation should have been updated to reflect this scenario, but was not. We will resolve this by noting the scenario in our docs.

The docs issue will be attached here once I created it. In short, the assumption is that there should be minimal occasions where LAST_ALTERED reflects something other than a data update. And when it does reflect something other than a data update, that situation would resolve itself with the next data update. The impact is only material if the underlying data is rarely updated. It would seem in those scenarios that the data refresh timestamp is less useful anyway.