Azure / data-landing-zone

Template to deploy a single Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Landing Zone is a logical construct and a unit of scale in the architecture that enables data retention and execution of data workloads for generating insights and value with data.
MIT License
165 stars 68 forks source link

Data Factory and Purview account are not connected #114

Open zeinab-mk opened 3 years ago

zeinab-mk commented 3 years ago

After the deployment is completed, I did not see the catalogUri tag on the ADF resource and ADF connection was in Disconnected status in Azure Purview.

image

image

marvinbuss commented 3 years ago

That is a good point. We will add the role assignment to the ARM templates, so that Purview has automatically access. We will probably give the MSI access to the overall subscription in order to also be able to scan all kinds of data sources.

Optimally, we would add the Purview MSI as Reader to the Management Group to scan all kinds of data assets within the tenant. However, this is not something we can perform automatically without the right access rights.

Therefore, I would suggest to add it to each Landing Zone as part of the Landing Zone deployment. @mboswell any thoughts or do you agree?

marvinbuss commented 3 years ago

Same issue as https://github.com/Azure/data-landing-zone/issues/115.

marvinbuss commented 3 years ago

This actually requires to add the MSI of Data Factory as "Purview Data Curator". This is not required for Synapse. Follow-up required from my side.

marvinbuss commented 3 years ago

We will not add this for now, since SHIR and Service Principal are required anyways for scans, if all services are behind private endpoints (e.g. Purview, Synapse, Data Factory, etc.). Therefore, we will hold off for now, since this is not something that is actually required when using private endpoints end-to-end.

marvinbuss commented 3 years ago

https://github.com/Azure/data-landing-zone/pull/190 will add private link connectivity for ADF. Synapse does not expose private endpoints via ARM and hence we cannot automate the setup in Synapse.

marvinbuss commented 3 years ago

All the role assignments for Purview now have been moved into the data plane. Hence, without using self-hosted agents, we are not able to access a private Purview instance. That means that we cannot make any role assignments from ARM to a collection other than the collection Admin role assignment to the root collection. I summary, that means that all ADF and Synapse role assignments have to be executed manually today. A user has to execute this via the Purview Portal today.

marvinbuss commented 2 years ago

Update: I am working on Full Automation of Lineage and Data Source onboarding here: https://github.com/marvinbuss/PurviewAutomation