databrickslabs / dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
https://databrickslabs.github.io/dlt-meta/
Other
148 stars 66 forks source link

Bring your own custom transformations for bronze/silver layer #68

Closed ravi-databricks closed 2 months ago

ravi-databricks commented 2 months ago

Need to support custom transformations for bronze and silver layer. e.g after reading from source pipeline reader returns dataframe, if customer want to transform dataframe need support to call custom function for transformation e.g below transformation needs to be applied on input dataframe

from pyspark.sql import DataFrame
from pyspark.sql.functions import lit
def custom_transform_func1(input_df) -> DataFrame:
  return input_df.withColumn('custom_col', lit('test1'))

def custom_transform_func2(input_df) -> DataFrame:
  return input_df.withColumn('custom_col', lit('test2'))

dlt-meta needs placehold to put tranformation functions and their order in onboarding.json e.g "transformation_functions":["custom_transform_func_test1","custom_transform_func2"]

you need to attach these function to DLT notebook via either add it to notebook before calling dlt-meta generic pipeline or pip install your custom function lib. This way DLT will get functions at runtime