agent86ray / SQL-AGENT-JOB-DATA-MART

This project builds a data mart from the SQL Server Agent tables. It implements an increment load design and includes type 2 slowly changing dimension handling for jobs and job steps.
1 stars 0 forks source link

Develop General Incremental Extract Pattern #5

Open agent86ray opened 4 years ago

agent86ray commented 4 years ago

Goal is to extract rows from the source table that have been added, updated or deleted since the last time the ETL process was run.

Need to some some analysis. May use profiling to get details. See https://docs.microsoft.com/en-us/sql/integration-services/control-flow/data-profiling-task-and-viewer

Sometimes the source table will have one or more columns that we can use to easily determine the rows that have changed.

There should be a target dimension or fact table that we can query to retrieve the column(s) thst we need to compare with the source to determine what changed. sysjobs is a great example but not every source table has all of the information we need in individual columns.

Sometimes we can use built-in features in the database to determine the rows that have changed. In the case of SQL Server we have change tracking and change data capture.

Sometimes people use tables to store the incremental load columns.

We could use a database snapshot to "freeze" the values in the source system.

By default I usually get every column in the source table just as a convenience so if I need to add a column to a target dimension or fact table, I don't have to go back and modify the extract.