dataform-co / dataform-scd

Common data models for creating type-2 slowly changing dimensions tables from mutable data sources in Dataform.
https://dataform.co
MIT License
21 stars 13 forks source link

Handling new insertion on existing SCD table #1

Open adiwijaya opened 2 years ago

adiwijaya commented 2 years ago

The SCD library doesn't seems working on handling new rows (insertion).

For example: Existing source table has rows with id : 1,2,3

After initial Dataform run The table_scd_updates has rows with id : 1,2,3

When there is a new row in the source table : 1,2,3,4

The current logic doesn't handle the new row (id 4).

File code : dataform-scd/index.js Code line : "// Create an incremental table with just pure updates, for a full history of the table." ... "where ${timestamp} > (select max(${timestamp}) from ${ctx.self()})"

The above logic will only insert new row when the id already exists from the initial SCD creation.

Is it intentional or bug?

How if adding additional clause e.g "where ${timestamp} > (select max(${timestamp}) from ${ctx.self()})" OR ${uniqueKey} NOT IN (select distinct ${uniqueKey} from ${ctx.self()}))

kkulczak commented 2 years ago

I cannot reproduce the bug.

In my experiments new row (id 4) was inserted correctly.