databrickslabs / dlt-meta

This is metadata driven DLT based framework for bronze/silver pipelines
Other
125 stars 54 forks source link

Handling historical silver transformations with Full Refresh #29

Closed WilliamMize closed 4 months ago

WilliamMize commented 6 months ago

I've built a process to generate our silver_transformations.json file. I have a question/future issue for what a DLT "Full refresh" could do if you have silver_transformations that might change for a table over time. If a "Full refresh" is done, then I believe it would only grab what is current in the dataflowspecTable for silver, which could miss any historical transformations put in place at their respective times. Am I missing something architecturally with DLT or DLT-meta that can handle this use case? I appreciate any help/conversation surrounding this topic.

ravi-databricks commented 6 months ago

@WilliamMize , yes! as part of full refresh it will apply latest silver transformations present in silver_dataflow_spec to full refresh. There are two way you can look into this

WilliamMize commented 6 months ago

@ravi-databricks, I appreciate your input! I often find myself preemptively strategizing for potential issues in our data architecture. There are two approaches I'm considering to enhance our system's resilience and flexibility:

Your thoughts on these approaches would be greatly appreciated.

ravi-databricks commented 6 months ago

@WilliamMize , There is version option while onboarding dataflowspec that means you can have multiple versions for same input-outputs. You need to increment versions so that new dataflowspec will be applied against pipeline

WilliamMize commented 4 months ago

@ravi-databricks Thank you for your help! I've got dlt-meta setup for our first system in production now.

ravi-databricks commented 4 months ago

Awesome!! @WilliamMize would love to hear about your journey to prod with dlt-meta.