Azure / bicep

Bicep is a declarative language for describing and deploying Azure resources
MIT License
3.27k stars 756 forks source link

Databricks Unity Catalog support #10115

Open villepuntanen opened 1 year ago

villepuntanen commented 1 year ago

The recommended pattern for utilizing Azure Databricks is to have Unity Catalog setup for the management and governance of the Databricks setup and for utilizing several features.

Currently there is support for setting up Unity Catalog with Terraform (Link to docs), but the same would be needed in Bicep.

leinoaa commented 1 year ago

This would be a great addition!

alex-frankel commented 1 year ago

It looks like these databricks resources are not ARM control plane resources. In terraform, they are using a dedicated databricks provider, so this would only be possible if someone builds a databricks provider for bicep via bicep extensibility.

aucampia commented 1 year ago

Related: https://github.com/Azure/bicep/issues/9967

You are going to keep getting this question again and again until the inaccuracies in the documentation is fixed. Selling Bicep as "Day 0 resource provider support. Any Azure resource — whether in private or public preview or GA — can be provisioned using Bicep." and then clarifying later that actually by any Azure resources is meant only some is not really an acceptable thing to be doing.

It is misleading and results in people making decisions based on false information.

aucampia commented 1 year ago

so this would only be possible if someone builds a databricks provider for bicep via bicep extensibility

Do you have any documentation on how to do something like this?

alex-frankel commented 1 year ago

For this to happen in the short term, we would need the team that manages the Databricks RP to implement it since we are only supporting first-party maintained providers for now.

villepuntanen commented 1 year ago

so this would only be possible if someone builds a databricks provider for bicep via bicep extensibility

Do you have any documentation on how to do something like this?

Hi, been looking a bit on this... Here's one example of a provider utilizing the extensibility: https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/bicep-extensibility-kubernetes-provider

jikuja commented 1 year ago

You are going to keep getting this question again and again until the inaccuracies in the documentation is fixed. Selling Bicep as "Day 0 resource provider support. Any Azure resource — whether in private or public preview or GA — can be provisioned using Bicep." and then clarifying later that actually by any Azure resources is meant only some is not really an acceptable thing to be doing.

It is misleading and results in people making decisions based on false information.

https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/control-plane-and-data-plane should help understanding Azure control and data plane differences

aucampia commented 1 year ago

https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/control-plane-and-data-plane should help understanding Azure control and data plane differences

Can you clarify how this will help anyone do declarative management of Azure Databricks resources with Bicep? If this is not something that is in scope of Bicep it is best to just close this issue as wont-fix.

I don't think trying to reclassify the Azure Databricks control plane [ref] as a data plane is very productive or addresses any of the problems that anyone has. The core issue is that there are Azure Databricks resources that can be declaratively managed using other declarative resource management tools, that should be declaratively managed if modern engineering practices are followed, and that there is an expectation that the same is supported by Bicep, as Bicep says on the tin "Day 0 resource provider support. Any Azure resource — whether in private or public preview or GA — can be provisioned using Bicep".

Even if you somehow convinced people that the Azure Databricks control plane [ref] is not a control plane, they would still want to declaratively manage Azure Databricks resources, and they would still expect a tool that says "Day 0 resource provider support. Any Azure resource — whether in private or public preview or GA — can be provisioned using Bicep" to offer me declarative management of Azure Databricks resources.

jikuja commented 1 year ago

Can you clarify how this will help anyone do declarative management of Azure Databricks resources with Bicep?

That's for the reference documentation to understand what is control plane and data plane for ARM resources.

I don't think trying to reclassify the Azure Databricks control plane [ref] as a data plane is very productive or addresses any of the problems that anyone has.

Well, it is a data plane for ARM point of view. Datbricks documentation probably should mention that it is databricks-specific control plane that is not available via ARM.

they would still want to declaratively manage Azure Databricks resources

If you read @alex-frankel message you noticed that data planes will be handled at some point with Bicep providers

The core issue is that there are Azure Databricks resources that can be declaratively managed using other declarative resource management tools, that should be declaratively managed if modern engineering practices are followed,

I know and I really hope it will be part of the bicep at some point. Setting up databricks with scripts is not a good process.

an expectation that the same is supported by Bicep, as Bicep says on the tin "Day 0 resource provider support. Any Azure resource — whether in private or public preview or GA — can be provisioned using Bicep".

I'm not sure how common this expectation is.

aucampia commented 1 year ago

an expectation that the same is supported by Bicep, as Bicep says on the tin "Day 0 resource provider support. Any Azure resource — whether in private or public preview or GA — can be provisioned using Bicep".

I'm not sure how common this expectation is.

It would be less common if your documentation was updated to clarify that it is talking about Azure RM resources, and that other resources are out of scope.

alex-frankel commented 1 year ago

We are going to get the docs updated this week to help clarify this more, but @jikuja is right -- Databricks is a dataplane from the perspective of ARM.

@aucampia, we would like to keep the issue open because the framework now exists to enable support for these scenarios. It is just a question of if anyone has the capacity to implement it. If there are many others who want/need this, that will help us revisit the priority and get it done sooner. Right now, there is no commitment that the Databricks team will be able to do get this done, so there are no ETAs we could provide.

aucampia commented 1 year ago

@aucampia, we would like to keep the issue open because the framework now exists to enable support for these scenarios.

Is there any documentation for this? I looked briefly at how K8S extension is implemented, but reverse engineering that to understand how to build one for Databricks is going to take more time than I have available, but I really would like to get a better picture for what the capabilities are. For example, how is state managed, does extensions require that state management is deferred? Will this integrate with deployment stacks at all?

alex-frankel commented 1 year ago

There is no documentation because third parties cannot contribute their own provider at this point. The only team that can resolve this for databricks is either the Databricks team or Microsoft.

There is no state management required and we do plan to integrate this with deployment stacks.