Closed github-christophe-oudar closed 2 months ago
@github-christophe-oudar Could you provide some use cases where you want to have a model self referencing itself? This is by default a pretty anti-pattern to dbt's idempotency.
Sure, I've few cases in production where I build self referencing models that are, for instance, daily partitioned tables that contains a 30 day sliding window on which I add new elements and expire old ones. Practically, I read my {{ this }} on previous day partition (where day = DATE_SUB({{ var("date") }}, INTERVAL 1 DAY)
) along expiration logic (AND event_ts > DATE_SUB(var("date") }}, INTERVAL 29 DAY)
) then it's idempotent to process the same partition as I'm inserting TIMESTAMP({{ var("date") }}) day
to insert in the right partition. I'm also able to join reference data that match the day using snapshots for instance.
Creating a single table makes it for downstream consumers (data apps) that just need to read the latest partition (or rollback to the previous one if something is fishy or provide a "time travel" experience).
Thank you for the explanation! That use case makes sense to me (I've done something similar before).That said, I will close this issue for now because we do have a broader project that we are currently working on to be announced close to Coalesce that should resolve this use case more delightfully.
Is this a new bug in dbt-bigquery?
Current Behavior
In some cases, models can be designed to self reference itself (
{{ this }}
) as part of a "sliding window" approach.Running that kind model AFTER adding a change to clustering/partitioning results in triggering: https://github.com/dbt-labs/dbt-bigquery/blob/0627aa2ec5283fdc3f7df43865d37e3612f3df5b/dbt/include/bigquery/macros/materializations/table.sql#L27-L30
dbt will drop your current "state" and your query will fail (because
{{ this }}
won't exist).If you add an
is_incremental
backup it might backfire as you would recreate the table "from scratch" without previous state making it a "silent" bug.It would happen both in
table
andincremental
materializationExpected Behavior
This code is still relevant as trying to reproduce the root cause would still throw something like:
I would suggest to:
Steps To Reproduce
A simple example would be to use following model:
Run the model.
Then you are adding a config block to
cluster_by = ['website']
.Rerun the model.
Relevant log output
No response
Environment
Additional Context
That bug is some unexpected "chaos engineering" that's not so easy to figure out