blacklabnz / blacklabnz.github.io

0 stars 0 forks source link

posts/purview-lineage-manual/ #71

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Purview Lineage: Part A Databricks Manual Lineage | blacklabnz | Data | DevOps

Purview has been published by Microsoft as a unified data governance solution to help manage and govern your multi-cloud, SaaS and on prem data. You can create a holistic and up-to-date view of your data landscape with automated data discovery, data classification and end to end lineage. This provides data users with valuable, trustworthy data management. While the auto scanned lineage is useful most of the times, there are always cases where you need to manually generate your lineage graph.

https://blacklabnz.github.io/posts/purview-lineage-manual/

lokhor commented 1 year ago

Could you use PyApacheAtlas from Azure Synapse data pipelines? keen to see it working to allow a similar thing with Synapse when not using actions that are natively supported for Purview lineage

blacklabnz commented 1 year ago

Hi @lokhor, thanks for your question, I believe you would be able to do this in both databricks cluster runtime as well as synapse spark run time . My personal exp was you are able to install python packages as long as you have connectivity to package repo, e.g. pypi or a enterprise artifacts location. Then you would run the scripts in there if that is the choice. For the purpose of the demo I decided to do it locally with mind of automating that somewhere in the pipeline for prod workload. Hope that answers your question