databrickslabs / dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
https://databrickslabs.github.io/dlt-meta/
Other
156 stars 71 forks source link

Unity Catalog Lineage is not generated #93

Open ln-data-bass opened 2 months ago

ln-data-bass commented 2 months ago

when we deployed this to our environment we noticed that there is no lineage generated in our UC tables.

is this expected behaviour?

my assumption is that the lineage would be generated by DLT pipelines. Is there a way to sequence the Bronze and Silver into one pipeline? would this then generate the appropriate lineage in UC?

ravi-databricks commented 2 months ago

You should see linage for silver uc table pointing to bronze. Inside DLT-META we are calling DLT APIs so should work same way if anyone would do notebook based sql or python.

Is there a way to sequence the Bronze and Silver into one pipeline? A: Currently DLT-META does not support chaining bronze/silver inside single DLT pipeline.

ln-data-bass commented 2 months ago

so far our observation is that the lineage is not generated for the silver table (downstream) or the bronze table (upstream). we assumed this was because there were two pipelines. any idea how to troubleshoot this?

WilliamMize commented 2 months ago

We see the same thing in our lineage. Our bronze streaming table and silver streaming table are in their own schema(bronze and silver), not sure if this breaks it or not. Here is a snapshot of one of our bronze tables with no lineage into a silver table. If you go look at the silver table's lineage you don't see it going back to the bronze table.

image

ravi-databricks commented 2 months ago

I created branch Issue_94 to chain bronze/silver into single DLT

dlt-meta-demo

As of now you need to use Direct publishing mode which is in Preview channel

direct_publishing_mode

Here is how dlt-meta config looks:

    "configuration": {
        "layer": "bronze_silver",
        "bronze.group": "A1",
        "silver.group": "A1",
        "bronze.dataflowspecTable": "ucname.dlt_meta_dataflowspecs_schema.bronze_dataflowspec",
        "silver.dataflowspecTable": "ucname.dlt_meta_dataflowspecs_schema.silver_dataflowspec"
    }
ln-data-bass commented 2 months ago

thanks @ravi-databricks, do you think this fix (getting bronze and silver in the same DLT Pipeline) would fix the lineage not being generated? or is it just using the "direct publishing mode" that would fix the lineage issue?

we currently don't see any lineage for bronze or silver tables:

image

image

ravi-databricks commented 2 months ago

It shows for silver tables pointing to downstream pipelines and for bronze there is nothing upstream. If you click on linage graph you would see streaming table.

silver_linage silver_linage_graph
ganeshchand commented 2 months ago

If your silver tables are the target of APPLY CHANGES INTO, then, currently it won't show the upstream lineage as documented here

ln-data-bass commented 2 months ago

thanks @ganeshchand for sharing. Do you expect the downstream lineage should be generated? because currently we don't see any lineage. we've followed the instructions exactly as documented and everything works as expected except that the lineage is not generated anywhere (not in the bronze or the silver).

ganeshchand commented 2 months ago

That doesn't sound normal to me. Would you be able to setup a test DLT pipeline with the same set of flows and dependencies but without using dlt-meta and see if the lineage behavior is different? This will confirm if it is an issue w/ dlt-meta or not.

ln-data-bass commented 2 months ago

hi @ganeshchand I confirmed that when I use the APPLY CHANGES INTO API in DLT then no lineage is generated (neither upstream from bronze nor downstream from silver). we experience this when using DLT META and also when not using DLT META. Therefore this is clearly a limitation that applies to UC Lineage capability and not to DLT META (as per your link to the documentation).

When I don't use APPLY CHANGES INTO to update silver, then both downstream and upstream lineage are generated.

@ravi-databricks do you have any idea if Direct Publishing Mode will resolve this limitation in UC Lineage?