atc-net / atc-dataplatform

A common set of python libraries for DataBricks
https://atc-net.github.io/repository/atc-dataplatform
MIT License
8 stars 3 forks source link

feat: etl process framework #256

Closed MartinBoge closed 1 year ago

MartinBoge commented 1 year ago

This PR provide a way to ensure that the same structure is kept across all etl flows/processes. Introducing a new class called ETLProcess. The ETLProcess is a base class to inherit from when defining etl processes. When building etl processes there is a need for maintaining a standardized structure and providing common logic around the etl process as well. In addition, two base classes are provided for a parameter object (ETLParameters) and a call arguments object (ETLCallArgs).

Background: from experience a nice way to build etl processes are structured in the following manner:

Outcome: when inheriting from the ETLProcess base class the following apply:

Feel free to reach out for more in depth explanation and discussion. Martin Bøge