Open plaflamme opened 7 months ago
Adding a +1, would be ideal to have a 3rd model that combines the two SCD2 model types where I can specify the date (in this case a snapshot date or loaded at) and the columns to check for changes, when changes are detected the date column is used if not row is ignored
This PR: https://github.com/TobikoData/sqlmesh/pull/1997 adds a new way of maintaining a SCD Type 2 model from detecting changes to the source table's columns.
This issue is for tracking the idea of extending this behaviour to build a SCD Type 2 from a table that contains periodic snapshots of the source data.
Imagine a source table that looks like this:
And some periodic process that takes snapshot of this data and makes those snapshots available in another table, e.g.:
These snaphsots allow tracking the changes that were made to the individual rows (by comparing the values), but it also contains a timestamp that can be used to determine when those changes occured. As such, a SCD Type 2 dimension can be built from this data which might look like this:
Ideally, the new
SCD_TYPE_2_BY_COLUMN
model kind would allow specifying a column (snapshot_date
in this case) as the timestamp to use for determining when a row has changed instead of usingexecution_time
.