databricks / databricks-sdk-py

Databricks SDK for Python (Beta)
https://databricks-sdk-py.readthedocs.io/
Apache License 2.0
372 stars 125 forks source link

[ISSUE] migrate-tables DBUtilsCore.mounts() is not whitelisted on class error #718

Open maruppel opened 3 months ago

maruppel commented 3 months ago

Description When running migrate-tables workflow a Py4J Security Exception is thrown on all migrate tasks.

Reproduction When running the migrate-tables workflow from Databricks UI, all migrate tasks dbfs-root/non-delta/external/views fail with the error below.

Expected behavior The migrate-tables workflow succeeds migrating tables/views.

Is it a regression? Have been running these workflows in multiple workspaces typically with UCX version 0.27.1 have not seen the error before. Current running version is 0.28.2.

Debug Logs 16:46:22 DEBUG [databricks] {MainThread} Task crash details Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py", line 100, in trigger current_task(ctx) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/workflows.py", line 29, in migrate_dbfs_root_delta_tables ctx.tables_migrator.migrate_tables( File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/table_migrate.py", line 80, in migrate_tables all_principal_grants = None if acl_strategy is None else self._principal_grants.get_interactive_cluster_grants() File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/grants.py", line 557, in get_interactive_cluster_grants mounts = list(self._mounts_crawler.snapshot()) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/locations.py", line 252, in snapshot return self._snapshot(self._try_fetch, self._list_mounts) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/crawlers.py", line 116, in _snapshot loaded_records = list(loader()) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/locations.py", line 247, in _list_mounts for mount_point, source, _ in self._dbutils.fs.mounts(): File "/databricks/python_shell/dbruntime/dbutils.py", line 362, in f_with_exception_handling return f(*args, **kwargs) File "/databricks/python_shell/dbruntime/dbutils.py", line 497, in mounts self.print_return(self.dbcore.mounts()), MountInfo.create_from_jschema) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 330, in get_return_value raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o432.mounts. Trace: py4j.security.Py4JSecurityException: Method public com.databricks.backend.daemon.dbutils.DBUtilsCore$Result com.databricks.backend.daemon.dbutils.DBUtilsCore.mounts() is not whitelisted on class class com.databricks.backend.daemon.dbutils.DBUtilsCore at py4j.security.WhitelistingPy4JSecurityManager.checkCall(WhitelistingPy4JSecurityManager.java:473) at py4j.Gateway.invoke(Gateway.java:305) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750)

Other Information

JCZuurmond commented 2 months ago

Moving this to ucx: https://github.com/databrickslabs/ucx/issues/2498

JCZuurmond commented 2 months ago

@maruppel : What Databricks runtime is the migrate-tables workflow failling for?

maruppel commented 2 months ago

@maruppel : What Databricks runtime is the migrate-tables workflow failling for?

DBR is 15.3, also this is after assessment has been run, which was a question in the other issue.

JCZuurmond commented 2 months ago

@maruppel : Thank you for reporting back. I'll tack the question of the assessment being run in the other issue. Will cover the mounts whitelist issue here:

I tried to reproduce the error using the following (shortened) code path from ucx:

from databricks.sdk import WorkspaceClient
ws = WorkspaceClient()
ws.dbutils.fs.mounts()

You mentioned that the workflow worked before with ucx versoin 0.27.1 and not anymore with version 0.28.2. The difference in sdk dependency is:

When I run the above code snippet after installing the sdk 0.29.0 on a cluster with DBR 15.3 in AWS (see complete configuration below), I do not receive the same whitelist error.

%pip install databricks-sdk~=0.29.0
dbutils.library.restartPython()
Cluster configuration { "cluster_id": "REDACTED", "creator_user_name": "REDACTED", "driver": { "private_ip": "REDACTED", "node_id": "REDACTED", "instance_id": "i-REDACTED", "start_timestamp": 1724916806770, "node_aws_attributes": { "is_spot": false }, "node_attributes": { "is_spot": false }, "host_private_ip": "REDACTED" }, "spark_context_id": 1513542293054547000, "driver_healthy": true, "jdbc_port": 10000, "cluster_name": "cor-test-cluster", "spark_version": "15.3.x-scala2.12", "spark_conf": { "spark.master": "local[*, 4]", "spark.databricks.cluster.profile": "singleNode" }, "aws_attributes": { "first_on_demand": 1, "availability": "SPOT_WITH_FALLBACK", "zone_id": "auto", "spot_bid_price_percent": 100, "ebs_volume_count": 0 }, "node_type_id": "r6id.large", "driver_node_type_id": "r6id.large", "custom_tags": { "ResourceClass": "SingleNode" }, "autotermination_minutes": 30, "enable_elastic_disk": true, "disk_spec": { "disk_count": 0 }, "cluster_source": "UI", "single_user_name": "REDACTED", "enable_local_disk_encryption": false, "instance_source": { "node_type_id": "r6id.large" }, "driver_instance_source": { "node_type_id": "r6id.large" }, "data_security_mode": "SINGLE_USER", "runtime_engine": "STANDARD", "effective_spark_version": "15.3.x-scala2.12", "state": "RUNNING", "state_message": "", "start_time": 1724852122131, "last_state_loss_time": 1724916949310, "last_activity_time": 1724916983193, "last_restarted_time": 1724916949353, "num_workers": 0, "cluster_memory_mb": 16384, "cluster_cores": 2, "default_tags": { "Vendor": "Databricks", "Creator": "REDACTED", "ClusterName": "cor-test-cluster", "ClusterId": "REDACTED", "Budget": "opex.sales.labs", "Owner": "REDACTED" }, "init_scripts_safe_mode": false, "spec": { "cluster_name": "cor-test-cluster", "spark_version": "15.3.x-scala2.12", "spark_conf": { "spark.master": "local[*, 4]", "spark.databricks.cluster.profile": "singleNode" }, "aws_attributes": { "first_on_demand": 1, "availability": "SPOT_WITH_FALLBACK", "zone_id": "auto", "spot_bid_price_percent": 100, "ebs_volume_count": 0 }, "node_type_id": "r6id.large", "driver_node_type_id": "r6id.large", "custom_tags": { "ResourceClass": "SingleNode" }, "autotermination_minutes": 30, "enable_elastic_disk": true, "single_user_name": "REDACTED", "enable_local_disk_encryption": false, "data_security_mode": "SINGLE_USER", "runtime_engine": "STANDARD", "effective_spark_version": "14.3.x-scala2.12", "num_workers": 0, "apply_policy_default_values": false } }

Could you try the above code snippets in your environment and report back if it recreations the issue for you?

And, does the issue persist with the latest ucx version?

JCZuurmond commented 2 months ago

Note that you can verify the installed sdk version:

from databricks.sdk.version import __version__
print(__version__)
maruppel commented 2 months ago

Running the above code I am getting the same whitelist error, sdk=0.29.0

JCZuurmond commented 2 months ago

Oke, thank you. If you have an idea what could cause the whitelist error, please share. It sounds like some network restrictions are causing this issue.

Otherwise, I will leave it to the sdk team to resolve this