activewarehouse / activewarehouse-etl

Extract-Transform-Load library from ActiveWarehouse
MIT License
240 stars 102 forks source link

Audit SCD code #29

Open thbar opened 13 years ago

thbar commented 13 years ago

I don't have a clear understanding of how SCD works currently in activewarehouse-etl, so I plan to audit the code, write more tests, and document myself.

See http://en.wikipedia.org/wiki/Slowly_changing_dimension

thbar commented 13 years ago

For the record: @lgustafson reported on the google group:

I am using it for type 2 changes primarily.  There are some issues
with it, however.  When a type 2 change is detected, the outdated
record is *deleted* from the table, and then a new version of the
outdated record (with new effective end date and latest flag) as well
as the new version of the record is placed into the pipeline.  This is
problematic if you are using foreign keys on your fact tables, and is
dangerous should your ETL process fail.  I have experimented with some
changes to the code that would allow for updating the outdated record
rather than delete/insert.

Another issue is that one should be able to specify the type of SCD
change at the column level.  For example, you may want some columns in
a table to support type 1 changes (immediate update, no new record)
and others support type 2 changes.
thbar commented 13 years ago

Interesting commits:

https://github.com/darrell/activewarehouse-etl/commit/0d559dd619aa776bdfc2e7cb71a9ee3a19d2d5b8 https://github.com/darrell/activewarehouse-etl/commit/3fc33e37528750b6c2b59b02595af5a5157c6519 https://github.com/darrell/activewarehouse-etl/commit/8971f3851f7b312e4e1c12d806f3025251e8fc68