Open ravi-databricks opened 2 months ago
Introduced _bronze_appendflows and _silver_appendflows inside onboarding file with below structure:
e.g Main bronze table customer needs to insert from different datasets then DLT-META can launch multiple flows under :
"bronze_append_flows": [
{
"name": "customer_bronze_flow",
"create_streaming_table": false,
"source_format": "cloudFiles",
"source_details": {
"source_path_it": "{dbfs_path}/integration_tests/resources/data/customers_af",
"source_schema_path": "{dbfs_path}/integration_tests/resources/customers.ddl"
},
"reader_options": {
"cloudFiles.format": "json",
"cloudFiles.inferColumnTypes": "true",
"cloudFiles.rescuedDataColumn": "_rescued_data"
},
"once": false
}
]
With above example in case of kafka as _source_format_ appendflows can contain multiple topics in source_details and reader_options_
As a result of above change needs to restructure pipeline readers to contain state information like source_details, source_format, reader_options and schema_json. This will make sure dlt.append_flow can have respective callable functions from PipelineReaders like read_dlt_cloud_files, read_dlt_delta, read_kafka
Incorporated additional parameters for dlt.apply_changes
flow_name,
once,
ignore_null_updates_column_list,
ignore_null_updates_except_column_list
@ganeshchand @neil90 @howardwu-db
Integrate append_flow API for following use cases:
API DOCS Ref