databrickslabs / dlt-meta

This is metadata driven DLT based framework for bronze/silver pipelines
Other
125 stars 54 forks source link

Ability to add columns to bronze tables similar to silver table query #15

Open Lackshu opened 9 months ago

Lackshu commented 9 months ago

It's not an issue but a feature request. This would be useful if we want to add the name of the source file name or put in processing time. Example: val df = spark.readStream.format("cloudFiles") .schema(schema) .option("cloudFiles.format", "csv") .option("cloudFiles.region","ap-south-1") .load("path") .withColumn("filePath",input_file_name())

WilliamMize commented 9 months ago

I did this in my own cloned branch. Inside of read_dlt_cloud_files:

spark.readStream.format(bronze_dataflow_spec.sourceFormat) .options(**reader_config_options) .schema(schema)
.load(source_path) .withColumn("_filePath",input_file_name()) .withColumn("_loadDate",lit(datetime.now()))

I thought about making a pull request for this but didn't want it to include those two columns every time and wasn't sure of the most elegant way of implementing giving the user the option of including the two additional columns.

ravi-databricks commented 8 months ago

We can add bring your own transformations functionality so that you can add columns