databrickslabs / dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
https://databrickslabs.github.io/dlt-meta/
Other
148 stars 66 forks source link

Add support for dlt.apply_changes_from_snapshot #86

Open ravi-databricks opened 1 month ago

ravi-databricks commented 1 month ago

Provide support for dlt.apply_changes_from_snapshot

ravi-databricks commented 1 week ago

Implementation Details: Onboarding:

  1. Introduce snapshot format inside onboarding file
  2. Introduce bronze_apply_changes_from_snapshot config keys and scd_type are mandatory fields
    "bronze_apply_changes_from_snapshot":{
      "keys": ["id"] 
      "scd_type": "1"
      "track_history_column_list": []
      "track_history_except_column_list":[]
    }

DataflowPipeline:

  1. Add argument to dataflowpipeline to accept snapshot_reader_func
  2. snapshot_reader_func will be applied to dlt.apply_changes_from_snapshot while doing bronze write

Usage:

  1. Provide snapshot reader function in a notebook while invoking Dataflowpipeline:
  2. Introduce new method
    pip install dlt-meta
    
    def next_snapshot_and_version():
     <<Provide logic here>>

layer = spark.conf.get("layer", None) from src.dataflow_pipeline import DataflowPipeline DataflowPipeline.invoke_dlt_pipeline(spark, layer, snapshot_reader_func=next_snapshot_and_version)