Closed jarnawer closed 5 months ago
@jarnawer Please include the exact stack trace.
Hi @nfx, what exact stack trace do you mean?. I can provide the standard error output, but has the same information.
@jarnawer we need to know the exact exception type it fails with and exact methods and lines. it has to be in that log ;)
Execute
databricks workspace export //Applications/ucx/logs/assessment/run-413746228294294/crawl_permissions.log
locally to troubleshoot with more details. Model serving is not enabled for your shard. Please contact your organization admin or Databricks support.
Wow, apologies for not reading that part. I missed it completely, sorry. Here is the export of that log:
15:19:46 INFO [databricks.labs.ucx] {MainThread} UCX v0.17.0 After job finishes, see debug logs at /Workspace/Applications/ucx/logs/assessment/run-413746228294294/crawl_permissions.log
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/preview/scim/v2/Groups?attributes=id,displayName,meta,roles,entitlements&startIndex=1&count=100
< 200 OK
< {
< "Resources": [
< {
< "displayName": "G-dspA0916001-chn-test-Contributor",
< "entitlements": [
< {
< "value": "**REDACTED**"
< },
< {
< "value": "**REDACTED**"
< },
< {
< "value": "**REDACTED**"
< },
< "... (1 additional elements)"
< ],
< "id": "133503787841503",
< "meta": {
< "resourceType": "Group"
< }
< },
< "... (7 additional elements)"
< ],
< "itemsPerPage": 8,
< "schemas": [
< "urn:ietf:params:scim:api:messages:2.0:ListResponse"
< ],
< "startIndex": 1,
< "totalResults": 8
< }
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/preview/scim/v2/Groups?attributes=id,displayName,meta,roles,entitlements&startIndex=9&count=100
< 200 OK
< {
< "itemsPerPage": 0,
< "schemas": [
< "urn:ietf:params:scim:api:messages:2.0:ListResponse"
< ],
< "startIndex": 9,
< "totalResults": 8
< }
15:19:46 INFO [databricks.labs.ucx.workspace_access.manager] {MainThread} Cleaning up inventory table hive_metastore.ucx.permissions
15:19:46 DEBUG [databricks.labs.lsql.backends] {MainThread} [spark][execute] DROP TABLE IF EXISTS hive_metastore.ucx.permissions
15:19:46 INFO [databricks.labs.ucx.workspace_access.manager] {MainThread} Inventory table cleanup complete
15:19:46 DEBUG [databricks.labs.ucx.workspace_access.manager] {MainThread} Crawling permissions
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/clusters/list
< 200 OK
< {
< "clusters": [
< {
< "autotermination_minutes": 0,
< "azure_attributes": {
< "availability": "ON_DEMAND_AZURE",
< "first_on_demand": 1,
< "spot_bid_max_price": -1.0
< },
< "cluster_cores": 4.0,
< "cluster_id": "0321-151452-bfwajp40",
< "cluster_memory_mb": 8192,
< "cluster_name": "job-261840359341744-run-413746228294294-main",
< "cluster_source": "JOB",
< "creator_user_name": "***",
< "custom_tags": {
< "ResourceClass": "SingleNode",
< "version": "v0.17.0"
< },
< "data_security_mode": "LEGACY_SINGLE_USER",
< "default_tags": {
< "ClusterId": "0321-151452-bfwajp40",
< "ClusterName": "job-261840359341744-run-413746228294294-main",
< "Creator": "***",
< "JobId": "261840359341744",
< "RunName": "[UCX] assessment",
< "Vendor": "Databricks",
< "applicationId": "A0916",
< "applicationName": "CIT-O DSP",
< "environment": "test",
< "expirationDate": "2022-12-31",
< "owner": "***",
< "platformId": "PLF0070",
< "requester": "***",
< "serviceCode": "MRCS"
< },
< "disk_spec": {},
< "driver": {
< "host_private_ip": "10.44.15.199",
< "instance_id": "2baab114e64041c2a62856d405663f62",
< "node_attributes": {
< "is_spot": false
< },
< "node_id": "d66c5c9136794e11b7c8fc857ba3eeac",
< "private_ip": "10.44.15.136",
< "public_dns": "",
< "start_timestamp": 1711034093240
< },
< "driver_healthy": true,
< "driver_instance_source": {
< "node_type_id": "Standard_F4s"
< },
< "driver_node_type_id": "Standard_F4s",
< "effective_spark_version": "14.3.x-scala2.12",
< "enable_elastic_disk": true,
< "enable_local_disk_encryption": false,
< "init_scripts_safe_mode": false,
< "instance_source": {
< "node_type_id": "Standard_F4s"
< },
< "jdbc_port": 10000,
< "last_activity_time": 1711034138911,
< "last_restarted_time": 1711034180128,
< "last_state_loss_time": 0,
< "node_type_id": "Standard_F4s",
< "num_workers": 0,
< "policy_id": "00103CF50C32348D",
< "single_user_name": "***",
< "spark_conf": {
< "spark.databricks.cluster.profile": "singleNode",
< "spark.master": "local[*]"
< },
< "spark_context_id": 2439339821249193680,
< "spark_version": "14.3.x-scala2.12",
< "start_time": 1711034092311,
< "state": "RUNNING",
< "state_message": ""
< },
< "... (27 additional elements)"
< ]
< }
15:19:46 INFO [databricks.labs.ucx.workspace_access.generic] {MainThread} Listed clusters in 0:00:00.047167
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/policies/clusters/list
< 200 OK
< {
< "policies": [
< {
< "created_at_timestamp": 1709814294000,
< "definition": "{\"access_mode\":{\"hidden\":true,\"type\":\"fixed\",\"value\":\"SINGLE_USER\"},\"autotermination_minutes\":{\"... (876 more bytes)",
< "is_default": false,
< "name": "Data Engineer Cluster Policy",
< "policy_id": "0004B0557261BC4E"
< },
< "... (9 additional elements)"
< ],
< "total_count": 10
< }
15:19:46 INFO [databricks.labs.ucx.workspace_access.generic] {MainThread} Listed cluster-policies in 0:00:00.038094
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/instance-pools/list
< 200 OK
< {}
15:19:46 INFO [databricks.labs.ucx.workspace_access.generic] {MainThread} Listed instance-pools in 0:00:00.049407
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/sql/warehouses
< 200 OK
< {
< "warehouses": [
< {
< "auto_resume": true,
< "auto_stop_mins": 120,
< "cluster_size": "X-Small",
< "creator_id": 8956159007198419,
< "creator_name": "9b6c040b-3cc8-4280-be90-f6e705f5c25d",
< "enable_photon": true,
< "enable_serverless_compute": false,
< "health": {
< "status": "HEALTHY"
< },
< "id": "267906863f8dfbbc",
< "jdbc_url": "jdbc:spark://adb-2923397360314017.17.azuredatabricks.net:443/default;transportMode=http;ssl=1;Au... (55 more bytes)",
< "max_num_clusters": 5,
< "min_num_clusters": 1,
< "name": "Data Engineer SQL Warehouse",
< "num_active_sessions": 0,
< "num_clusters": 1,
< "odbc_params": {
< "hostname": "adb-2923397360314017.17.azuredatabricks.net",
< "path": "/sql/1.0/warehouses/267906863f8dfbbc",
< "port": 443,
< "protocol": "https"
< },
< "size": "XSMALL",
< "spot_instance_policy": "COST_OPTIMIZED",
< "state": "RUNNING",
< "tags": {
< "custom_tags": [
< {
< "key": "user_group",
< "value": "**REDACTED**"
< }
< ]
< },
< "warehouse_type": "PRO"
< },
< "... (2 additional elements)"
< ]
< }
15:19:46 INFO [databricks.labs.ucx.workspace_access.generic] {MainThread} Listed sql/warehouses in 0:00:00.037511
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.1/jobs/list
< 200 OK
< {
< "has_more": true,
< "jobs": [
< {
< "created_time": 1710846542457,
< "creator_user_name": "***",
< "job_id": 261840359341744,
< "settings": {
< "email_notifications": {
< "no_alert_for_skipped_runs": false,
< "on_failure": [
< "***"
< ],
< "on_success": [
< "***"
< ]
< },
< "format": "MULTI_TASK",
< "max_concurrent_runs": 1,
< "name": "[UCX] assessment",
< "tags": {
< "version": "v0.17.0"
< },
< "timeout_seconds": 0
< }
< },
< "... (19 additional elements)"
< ],
< "next_page_token": "CAEo7rm7080xMJvl09WB65YB"
< }
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.1/jobs/list?page_token=CAEo7rm7080xMJvl09WB65YB
< 200 OK
< {
< "has_more": false,
< "jobs": [
< {
< "created_time": 1699007989618,
< "creator_user_name": "***",
< "job_id": 354194666446075,
< "settings": {
< "email_notifications": {
< "no_alert_for_skipped_runs": false
< },
< "format": "MULTI_TASK",
< "max_concurrent_runs": 10,
< "name": "create_table",
< "timeout_seconds": 0
< }
< },
< "... (1 additional elements)"
< ],
< "prev_page_token": "CAAo8o6SprkxMPu5mfq1xFA="
< }
15:19:46 INFO [databricks.labs.ucx.workspace_access.generic] {MainThread} Listed jobs in 0:00:00.162886
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/pipelines
< 200 OK
< {}
15:19:46 INFO [databricks.labs.ucx.workspace_access.generic] {MainThread} Listed pipelines in 0:00:00.027808
15:19:46 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/serving-endpoints
< 404 Not Found
< {
< "error_code": "FEATURE_DISABLED",
< "message": "Model serving is not enabled for your shard. Please contact your organization admin or Databrick... (10 more bytes)"
< }
15:19:46 ERROR [databricks.labs.ucx] {MainThread} Execute `databricks workspace export //Applications/ucx/logs/assessment/run-413746228294294/crawl_permissions.log` locally to troubleshoot with more details. Model serving is not enabled for your shard. Please contact your organization admin or Databricks support.
15:19:46 DEBUG [databricks] {MainThread} Task crash details
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/tasks.py", line 255, in run_task
current_task.fn(cfg, workspace_client, sql_backend, installation)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py", line 240, in crawl_permissions
permission_manager.inventorize_permissions()
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/manager.py", line 94, in inventorize_permissions
crawler_tasks = list(self._get_crawler_tasks())
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/manager.py", line 221, in _get_crawler_tasks
yield from support.get_crawler_tasks()
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py", line 74, in get_crawler_tasks
for info in listing:
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py", line 58, in __iter__
for item in self._func():
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/service/serving.py", line 2596, in list
json = self._api.do('GET', '/api/2.0/serving-endpoints', headers=headers)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py", line 130, in do
response = retryable(self._perform)(method,
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/retries.py", line 54, in wrapper
raise err
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/retries.py", line 33, in wrapper
return func(*args, **kwargs)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py", line 238, in _perform
raise self._make_nicer_error(response=response, **payload) from None
databricks.sdk.errors.platform.NotFound: Model serving is not enabled for your shard. Please contact your organization admin or Databricks support.
@jarnawer okay, the fix is simple: surround these lines with try: ... except NotFound: pass
or something like that.
I'm sorry to say it does not appear this issue has been resolved.
I've attempted to re-run the assessment pipeline today following the release of v0.22.0 which includes the fix in PR #1275, however, the crawl_permissions
jobs is still failing on the same 'FEATURE_DISABLED' error I was experiencing last week (v0.21.0).
This was a fresh installation of UCX and I have confirmed I am running the latest version:
// version.json
{
"version": "0.22.0",
"wheel": "/Applications/ucx/wheels/databricks_labs_ucx-0.22.0-py3-none-any.whl",
"date": "2024-04-29T08:43:20.038438+00:00"
}
I've attached the full logs in crawl_permissions.log
, here:
Please let me know if there is anything you need or if I've missed some configuration.
Thank you.
Just hit this in UCX v0.27.1
InternalError: Listing serving-endpoints failed: Model serving is not enabled for your shard. Please contact your organization admin or Databricks support.
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
File ~/.ipykernel/4656/command--1-3817622542:18
15 entry = [ep for ep in metadata.distribution("databricks_labs_ucx").entry_points if ep.name == "runtime"]
16 if entry:
17 # Load and execute the entrypoint, assumes no parameters
---> 18 entry[0].load()()
19 else:
20 import importlib
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/runtime.py:103, in main(*argv)
101 if len(argv) == 0:
102 argv = sys.argv
--> 103 Workflows.all().trigger(*argv)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/runtime.py:80, in Workflows.trigger(self, *argv)
78 workflow = self._workflows[workflow_name]
79 if task_name == "parse_logs":
---> 80 return ctx.task_run_warning_recorder.snapshot()
81 # `{{parent_run_id}}` is the run of entire workflow, whereas `{{run_id}}` is the run of a task
82 workflow_run_id = named_parameters.get("parent_run_id", "unknown_run_id")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/installer/logs.py:203, in TaskRunWarningRecorder.snapshot(self)
201 error_messages.append(message)
202 if len(error_messages) > 0:
--> 203 raise InternalError("\n".join(error_messages))
204 return log_records
InternalError: Listing serving-endpoints failed: Model serving is not enabled for your shard. Please contact your organization admin or Databricks support.
UCX v0.27.1
UCX v0.27.1
Is there an existing issue for this?
Current Behavior
I have a series of Workspaces deployed in Switzerland North Azure Region. Due to regulatory compliance requirements it has to be that exact region.
When executing Assessment Workflow, it breaks in the "Crawl Permissions" step. The reason of the failure is because it is trying to access Model serving endpoints features to crawl permissions, but according to Databricks documentation, it is not enabled in Switzerland (https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/model-serving-limits#--region-availability)
Expected Behavior
Crawl permissions step, should execute regardless if Model Serving is available on the region or not. It should be configurable to crawl permissions for that.
Steps To Reproduce
No response
Cloud
Azure
Operating System
Linux
Version
latest via Databricks CLI
Relevant log output