`scd2` validity column values based on column in source data

Feature description

There are currently two options for validity column (i.e. "valid from" and "valid to") values:

based on "pipeline run timestamp" ➜ default
based on arbitrary value configured using the boundary_timestamp argument

I would like to see a third option:

based on source data column

In some cases an updated_at-style column in the source data "naturally" provides a good value for the validity columns.

It then makes sense to also detect changes based on this column, i.e. to base the "row hash" on a composite of the natural key and updated_at (i.e. the surrogate key).

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

User asked for it: https://github.com/dlt-hub/dlt/issues/1601#issuecomment-2264975587

Proposed solution

Extend public API with natural_key and boundary_timestamp_column arguments:

@dlt.resource(
    write_disposition={
        "disposition": "merge",
        "strategy": "scd2",
        "natural_key": "customer_id",
        "boundary_timestamp_column": "updated_at"
    }
)

dlt-hub / dlt