dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.68k stars 1.61k forks source link

create a snapshot on sources that already have some history #10320

Open graciegoheen opened 3 months ago

graciegoheen commented 3 months ago

Jumping into the discussion to add one suggestion/idea. Sorry if this was already discussed, I couldn't find it.

One complaint I hear from time to time is that you can't create a snapshot if the source has duplicates.

Imagine we have a source like

id status date
1 created 2024-01-01
1 processed 2024-02-01

It would be nice if the snapshot, in its first run, could read this source and be built like

id status date dbt_valid_from dbt_valid_to
1 created 2024-01-01 2024-01-01 2024-02-01
1 processed 2024-02-01 2024-02-01 null

We could have something similar to incremental models

It would only work for the timestamp strategy, because the snapshot must know what is older and what is newer.

Just an idea, maybe there are other ways to do it. But just because it is bad when we can't create a snapshot on sources that already have some history.

By the way, loved that snapshots are in the spotlight! 🧡

Originally posted by @bruno-szdl in https://github.com/dbt-labs/dbt-core/discussions/7018#discussioncomment-9706404

graciegoheen commented 3 months ago

More thinking here -> https://github.com/dbt-labs/dbt-core/issues/3878