gbrueckl / Databricks.API.PowerShell

PowerShell wrapper for the Databricks API
MIT License
42 stars 22 forks source link

Editing a unity catalog cluster resets the data_security_mode #83

Closed DGossman closed 1 year ago

DGossman commented 1 year ago

When I update a cluster that has been created using the UI to be a single user cluster with Unity Catalog enabled using the Update-DatabricksCluster endpoint the cluster access mode changes to "Custom" without Unity Catalog enabled. image

Looking at the cluster JSON before and after the update it looks like data_security_mode and single_user_name are gone which are the only things related to Unity Catalog and come back if I manually edit the access mode back to single user through the UI.

JSON Before:

{
    "autoscale": {
        "min_workers": 1,
        "max_workers": 4
    },
    "cluster_name": "DB-DP-SVIS-Cluster",
    "spark_version": "13.3.x-scala2.12",
    "spark_conf": {
        "databricks.spark.dbutils.fs.cp.server-side.enabled": "false",
        "spark.driver.maxResultSize": "16g",
        "spark.databricks.delta.preview.enabled": "true"
    },
    "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS4_v2",
    "driver_node_type_id": "Standard_DS4_v2",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "autotermination_minutes": 30,
    "enable_elastic_disk": false,
    "cluster_source": "UI",
    "init_scripts": [],
    "single_user_name": "XYZ",
    "enable_local_disk_encryption": false,
    "data_security_mode": "SINGLE_USER",
    "cluster_id": "XYZ"
}

JSON After:

{
    "autoscale": {
        "min_workers": 1,
        "max_workers": 4
    },
    "cluster_name": "DB-DP-SVIS-Cluster",
    "spark_version": "13.3.x-scala2.12",
    "spark_conf": {
        "databricks.spark.dbutils.fs.cp.server-side.enabled": "false",
        "spark.driver.maxResultSize": "16g",
        "spark.databricks.delta.preview.enabled": "true"
    },
    "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS4_v2",
    "driver_node_type_id": "Standard_DS4_v2",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "autotermination_minutes": 30,
    "enable_elastic_disk": false,
    "cluster_source": "UI",
    "init_scripts": [],
    "enable_local_disk_encryption": false,
    "cluster_id": "XYZ"
}

It seems like these two parameters need to be specified when doing an update through the API but are not available in the Update cmdlets. Is this something that can be added or is a workaround to the above issue?

gbrueckl commented 1 year ago

There are new properties added to the API on a very fast pace and it is hard to keep everything updated and expose dedicated parameters for all new properties.

I would recommend to use the -ClusterObject parameter instead and pass in an object with all properties you need. So you would get the current cluster object, change it and push it back. Something like this:

$obj = Get-DatabricksCluster -ClusterID "asdf-asdf-asdf"
§obj.myProperty = 123
$obj | Update-DatabricksCluster
DGossman commented 1 year ago

Thanks @gbrueckl. This worked in principle but the Update-DatabricksCluster cmdlet gave the bellow error and so I ended up using Invoke-DatabricksApiRequest -Method "POST" -EndPoint "/2.0/clusters/edit" -Body $config instead which did work. Where

$config = @{}
$obj.psobject.properties | ForEach-Object { $config[$_.Name] = $_.Value }

image