Open dmoore247 opened 1 year ago
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
@dmoore247 we certainly want to do this, so marking the issue as accepted. Can't give a concrete timeline on it though.
Describe the bug The Databricks connector page refers to using the lineage API on a per table basis and that is not scalable.
To Reproduce See docs: https://datahubproject.io/docs/generated/ingestion/sources/databricks/
Expected behavior The connector should use the scalable system tables to access all the lineage events:
system.access.table_lineage
andsystem.access.column_lineage
Additional context Databricks documentation on lineage tables https://docs.databricks.com/en/administration-guide/system-tables/lineage.html