databrickslabs / dlt-meta

This is metadata driven DLT based framework for bronze/silver pipelines
Other
125 stars 54 forks source link

50 integrate append flow api #58

Closed ravi-databricks closed 4 days ago

ravi-databricks commented 5 days ago

Incorporated dlt.append_flow api into dlt-meta. Introducing config in onboarding file for bronze/silver layer as below


        "bronze_append_flows": [
            {
                "name": "customer_bronze_flow",
                "create_streaming_table": false,
                "source_format": "cloudFiles",
                "source_details": {
                    "source_database": "APP",
                    "source_table": "CUSTOMERS",
                    "source_path_dev": "tests/resources/data/customers_af",
                    "source_schema_path": "tests/resources/schema/customers.ddl"
                },
                "reader_options": {
                    "cloudFiles.format": "json",
                    "cloudFiles.inferColumnTypes": "true",
                    "cloudFiles.rescuedDataColumn": "_rescued_data"
                },
                "once": true
            }
        ]

This way many sources can be added as part of single dlt pipeline which will append to same target table. In dataflow_pipeline.py added AppendFlowWriter which take this config and callsdlt.append_flow api