datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.99k stars 2.96k forks source link

Databricks Lineage extraction capability - update connector to use system.access.[table|column]_lineage system table. #9168

Open dmoore247 opened 1 year ago

dmoore247 commented 1 year ago

Describe the bug The Databricks connector page refers to using the lineage API on a per table basis and that is not scalable.

To Reproduce See docs: https://datahubproject.io/docs/generated/ingestion/sources/databricks/

Expected behavior The connector should use the scalable system tables to access all the lineage events: system.access.table_lineage and system.access.column_lineage

Additional context Databricks documentation on lineage tables https://docs.databricks.com/en/administration-guide/system-tables/lineage.html

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

hsheth2 commented 12 months ago

@dmoore247 we certainly want to do this, so marking the issue as accepted. Can't give a concrete timeline on it though.