databrickslabs / ucx

Automated migrations to Unity Catalog
Other
219 stars 75 forks source link

Test failure: `test_fresh_user_installation` #2622

Open github-actions[bot] opened 4 days ago

github-actions[bot] commented 4 days ago
❌ test_fresh_user_installation: TimeoutError: timed out after 0:20:00: (20m14.747s) ``` TimeoutError: timed out after 0:20:00: [gw6] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python 05:16 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_sokdo: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_sokdo 05:16 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sokdo', metastore_id=None, name='ucx_sokdo', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None) 05:16 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VLC2/config.yml) doesn't exist. 05:16 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration 05:16 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data. 05:16 INFO [databricks.labs.ucx.install] Fetching installations... 05:16 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy. 05:16 DEBUG [tests.integration.conftest] Waiting for clusters to start... 05:36 ERROR [databricks.labs.blueprint.parallel] ensure clusters running('TEST_USER_ISOLATION_CLUSTER_ID') task failed: timed out after 0:20:00: Traceback (most recent call last): File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/blueprint/parallel.py", line 158, in inner return func(*args, **kwargs), None File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/sdk/mixins/compute.py", line 235, in ensure_cluster_is_running self.wait_get_cluster_terminated(cluster_id) File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/sdk/service/compute.py", line 6746, in wait_get_cluster_terminated raise TimeoutError(f'timed out after {timeout}: {status_message}') TimeoutError: timed out after 0:20:00: 05:36 ERROR [databricks.labs.blueprint.parallel] More than half 'ensure clusters running' tasks failed: 0% results available (0/3). Took 0:20:04.609766 05:16 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_sokdo: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_sokdo 05:16 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sokdo', metastore_id=None, name='ucx_sokdo', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None) 05:16 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VLC2/config.yml) doesn't exist. 05:16 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration 05:16 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data. 05:16 INFO [databricks.labs.ucx.install] Fetching installations... 05:16 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy. 05:16 DEBUG [tests.integration.conftest] Waiting for clusters to start... 05:36 ERROR [databricks.labs.blueprint.parallel] ensure clusters running('TEST_USER_ISOLATION_CLUSTER_ID') task failed: timed out after 0:20:00: Traceback (most recent call last): File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/blueprint/parallel.py", line 158, in inner return func(*args, **kwargs), None File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/sdk/mixins/compute.py", line 235, in ensure_cluster_is_running self.wait_get_cluster_terminated(cluster_id) File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/sdk/service/compute.py", line 6746, in wait_get_cluster_terminated raise TimeoutError(f'timed out after {timeout}: {status_message}') TimeoutError: timed out after 0:20:00: 05:36 ERROR [databricks.labs.blueprint.parallel] More than half 'ensure clusters running' tasks failed: 0% results available (0/3). Took 0:20:04.609766 05:36 WARNING [databricks.labs.ucx.install] UCX workspace remote version not found: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VLC2/version.json) doesn't exist. 05:36 WARNING [databricks.labs.ucx.install] Installed version is too old: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.VLC2/version.json) doesn't exist. 05:36 DEBUG [tests.integration.conftest] Waiting for clusters to start... 05:36 DEBUG [tests.integration.conftest] Waiting for clusters to start... 05:36 INFO [databricks.labs.ucx.install] Deleting UCX v0.35.1+3220240913053614 from https://DATABRICKS_HOST 05:36 INFO [databricks.labs.ucx.install] Deleting inventory database ucx_sokdo 05:36 INFO [databricks.labs.ucx.install] Deleting jobs 05:36 ERROR [databricks.labs.ucx.install] No jobs present or jobs already deleted 05:36 INFO [databricks.labs.ucx.install] Deleting cluster policy 05:36 INFO [databricks.labs.ucx.install] Deleting secret scope 05:36 INFO [databricks.labs.ucx.install] UnInstalling UCX complete 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace user fixtures 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 account group fixtures 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace group fixtures 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 schema fixtures 05:36 DEBUG [databricks.labs.ucx.mixins.fixtures] removing schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sokdo', metastore_id=None, name='ucx_sokdo', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None) [gw6] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python ```

Running from nightly #194

JCZuurmond commented 4 days ago

This bug has me a bit in the woods. It fails on ensuring the test cluster TEST_USER_ISOLATION_CLUSTER_ID is running in the test setup. I checked the corresponding cluster, do not see anything noticeable, except:

JCZuurmond commented 4 days ago

Note that there were some Databricks issues reported yesterday for the same region as our testing environment: https://status.azuredatabricks.net/pages/incident/5d49ec10226b9e13cb6a422e/66e369a280a4e633c1aec6e7. However, the time window does not overlap with when we ran our integration test suite

pritishpai commented 10 hours ago

I tried running this a couple times on the IDE with success. I also manually stopped the cluster and the"ensure cluster is running" starts and the tests succeeds. Do we close this or add a cluster restart or something?