datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.63k stars 2.85k forks source link

Databricks Lineage extraction capability - update connector to use system.access.[table|column]_lineage system table. #9168

Open dmoore247 opened 10 months ago

dmoore247 commented 10 months ago

Describe the bug The Databricks connector page refers to using the lineage API on a per table basis and that is not scalable.

To Reproduce See docs: https://datahubproject.io/docs/generated/ingestion/sources/databricks/

Expected behavior The connector should use the scalable system tables to access all the lineage events: system.access.table_lineage and system.access.column_lineage

Additional context Databricks documentation on lineage tables https://docs.databricks.com/en/administration-guide/system-tables/lineage.html

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

hsheth2 commented 9 months ago

@dmoore247 we certainly want to do this, so marking the issue as accepted. Can't give a concrete timeline on it though.