Issue with migrate-views task in table migration which is related to concurrent operations on UCX internal database on workspace.
Only part of the views is properly migrated.
After running the table migration workflow multiple times the views get finally migrated.
(Views did not have any dependency between each other)
ManyError: Detected 2 failures: Unknown: [DELTA_CONCURRENT_DELETE_READ] ConcurrentDeleteReadException: This transaction attempted to read one or more files that were deleted (for example part-00000-0df56665-a002-4e1a-8c38-add89f9c16f3-c000.snappy.parquet in the root of the table) by a concurrent update. Please try the operation again. Conflicting commit: {"timestamp":1721292413579,"userId":"5695979632839528","userName":"jil.scott@domain.com","operation":"DELETE","operationParameters":{"predicate":["true"]},"job":{"jobId":"945981217581767","jobName":"[UCX] migrate-tables","jobRunId":"619297246306369","runId":"995793153947461","jobOwnerId":"5695979632839528","triggerType":"manual"},"clusterId":"0718-083610-suw33de7","readVersion":73,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"3090","numCopiedRows":"0","numDeletionVectorsAdded":"0","numDeletionVectorsRemoved":"0","numAddedChangeFiles":"0","executionTimeMs":"917","numDeletionVectorsUpdated":"0","numDeletedRows":"42","scanTimeMs":"916","numAddedFiles":"0","numAddedBytes":"0","rewriteTimeMs":"0"},"tags":{"noRowsCopied":"true","delta.rowTracking.preserved":"false","restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/15.3.x-scala2.12","txnId":"2395993f-f87a-45f9-be39-52875d3d7793"} Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.
Expected Behavior
No such concurrency error. One run of migrate-views should be enough to migrate all views.
All Tables migrated successfully using SYNC (all external tables).
Cloud
Azure
Operating System
Linux
Version
latest via Databricks CLI
Relevant log output
08:48:04 DEBUG [databricks.labs.lsql.backends] {migrate_views_0} [spark][execute] ALTER VIEW analytics_qa.legato.gold_vw_dim_aircraft_realtime SET TBLPROPERTIES ('upgraded_from' ... (109 more bytes)
08:48:05 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] {migrate_views_0} Migrating acls on analytics_qa.legato.gold_vw_dim_aircraft_realtime using SQL query: ALTER VIEW analytics_qa.legato.gold_vw_dim_aircraft_realtime OWNER TO `piotr.blaszczak@domain.com`
08:48:07 INFO [databricks.labs.blueprint.parallel] {migrate_views_0} migrate views 3/3, rps: 0.040/sec
08:48:07 ERROR [databricks.labs.blueprint.parallel] {MainThread} More than half 'migrate views' tasks failed: 33% results available (1/3). Took 0:01:15.471772
08:48:07 ERROR [databricks.labs.ucx] {MainThread} Execute `databricks workspace export //Applications/ucx/logs/migrate-tables/run-619297246306369-0/migrate_views.log` locally to troubleshoot with more details. Detected 2 failures: Unknown: [DELTA_CONCURRENT_DELETE_READ] ConcurrentDeleteReadException: This transaction attempted to read one or more files that were deleted (for example part-00000-0df56665-a002-4e1a-8c38-add89f9c16f3-c000.snappy.parquet in the root of the table) by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1721292413579,"userId":"5695979632839528","userName":"jil.scott@domain.com","operation":"DELETE","operationParameters":{"predicate":["true"]},"job":{"jobId":"945981217581767","jobName":"[UCX] migrate-tables","jobRunId":"619297246306369","runId":"995793153947461","jobOwnerId":"5695979632839528","triggerType":"manual"},"clusterId":"0718-083610-suw33de7","readVersion":73,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"3090","numCopiedRows":"0","numDeletionVectorsAdded":"0","numDeletionVectorsRemoved":"0","numAddedChangeFiles":"0","executionTimeMs":"917","numDeletionVectorsUpdated":"0","numDeletedRows":"42","scanTimeMs":"916","numAddedFiles":"0","numAddedBytes":"0","rewriteTimeMs":"0"},"tags":{"noRowsCopied":"true","delta.rowTracking.preserved":"false","restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/15.3.x-scala2.12","txnId":"2395993f-f87a-45f9-be39-52875d3d7793"}
Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.
08:48:07 DEBUG [databricks] {MainThread} Task crash details
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/runtime.py", line 100, in trigger
current_task(ctx)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/hive_metastore/workflows.py", line 63, in migrate_views
ctx.tables_migrator.migrate_tables(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/hive_metastore/table_migrate.py", line 87, in migrate_tables
return self._migrate_views(acl_strategy, all_grants_to_migrate, all_migrated_groups, all_principal_grants)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/hive_metastore/table_migrate.py", line 140, in _migrate_views
Threads.strict("migrate views", tasks)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/blueprint/parallel.py", line 63, in strict
raise ManyError(errs)
databricks.labs.blueprint.parallel.ManyError: Detected 2 failures: Unknown: [DELTA_CONCURRENT_DELETE_READ] ConcurrentDeleteReadException: This transaction attempted to read one or more files that were deleted (for example part-00000-0df56665-a002-4e1a-8c38-add89f9c16f3-c000.snappy.parquet in the root of the table) by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1721292413579,"userId":"5695979632839528","userName":"jil.scott@domain.com","operation":"DELETE","operationParameters":{"predicate":["true"]},"job":{"jobId":"945981217581767","jobName":"[UCX] migrate-tables","jobRunId":"619297246306369","runId":"995793153947461","jobOwnerId":"5695979632839528","triggerType":"manual"},"clusterId":"0718-083610-suw33de7","readVersion":73,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"3090","numCopiedRows":"0","numDeletionVectorsAdded":"0","numDeletionVectorsRemoved":"0","numAddedChangeFiles":"0","executionTimeMs":"917","numDeletionVectorsUpdated":"0","numDeletedRows":"42","scanTimeMs":"916","numAddedFiles":"0","numAddedBytes":"0","rewriteTimeMs":"0"},"tags":{"noRowsCopied":"true","delta.rowTracking.preserved":"false","restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/15.3.x-scala2.12","txnId":"2395993f-f87a-45f9-be39-52875d3d7793"}
Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.
I encountered the same issue, and multiple attempts to rerun didn't help. Previously, a second try would complete the workflow without an error. Here is a more specific log to help understand what's happening.
Is there an existing issue for this?
Current Behavior
Issue with migrate-views task in table migration which is related to concurrent operations on UCX internal database on workspace. Only part of the views is properly migrated. After running the table migration workflow multiple times the views get finally migrated. (Views did not have any dependency between each other)
ManyError: Detected 2 failures: Unknown: [DELTA_CONCURRENT_DELETE_READ] ConcurrentDeleteReadException: This transaction attempted to read one or more files that were deleted (for example part-00000-0df56665-a002-4e1a-8c38-add89f9c16f3-c000.snappy.parquet in the root of the table) by a concurrent update. Please try the operation again. Conflicting commit: {"timestamp":1721292413579,"userId":"5695979632839528","userName":"jil.scott@domain.com","operation":"DELETE","operationParameters":{"predicate":["true"]},"job":{"jobId":"945981217581767","jobName":"[UCX] migrate-tables","jobRunId":"619297246306369","runId":"995793153947461","jobOwnerId":"5695979632839528","triggerType":"manual"},"clusterId":"0718-083610-suw33de7","readVersion":73,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"3090","numCopiedRows":"0","numDeletionVectorsAdded":"0","numDeletionVectorsRemoved":"0","numAddedChangeFiles":"0","executionTimeMs":"917","numDeletionVectorsUpdated":"0","numDeletedRows":"42","scanTimeMs":"916","numAddedFiles":"0","numAddedBytes":"0","rewriteTimeMs":"0"},"tags":{"noRowsCopied":"true","delta.rowTracking.preserved":"false","restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/15.3.x-scala2.12","txnId":"2395993f-f87a-45f9-be39-52875d3d7793"} Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.
Expected Behavior
No such concurrency error. One run of migrate-views should be enough to migrate all views.
Steps To Reproduce
Environment: Azure cloud. UCX version: v0.28.2
Config:
Steps:
All Tables migrated successfully using SYNC (all external tables).
Cloud
Azure
Operating System
Linux
Version
latest via Databricks CLI
Relevant log output