Open Lackshu opened 9 months ago
I did this in my own cloned branch. Inside of read_dlt_cloud_files:
spark.readStream.format(bronze_dataflow_spec.sourceFormat)
.options(**reader_config_options)
.schema(schema)
.load(source_path)
.withColumn("_filePath",input_file_name())
.withColumn("_loadDate",lit(datetime.now()))
I thought about making a pull request for this but didn't want it to include those two columns every time and wasn't sure of the most elegant way of implementing giving the user the option of including the two additional columns.
We can add bring your own transformations functionality so that you can add columns
It's not an issue but a feature request. This would be useful if we want to add the name of the source file name or put in processing time. Example: val df = spark.readStream.format("cloudFiles") .schema(schema) .option("cloudFiles.format", "csv") .option("cloudFiles.region","ap-south-1") .load("path") .withColumn("filePath",input_file_name())