Closed dmoore247 closed 10 months ago
Thank you for the feature request! Currently, the team operates in a limited capacity, carefully prioritizing, and we cannot provide a timeline to implement this feature. Please make a Pull Request if you'd like to see this feature sooner, and we'll guide you through the journey.
Also fails with:
22:43:43 INFO [d.labs.ucx] UCX v0.9.0 After job finishes, see debug logs at /Workspace/Users/first.last@databricks.com/.ucx/logs/assessment/run-67162835779538/estimate_table_size_for_migration.log
22:43:53 ERROR [d.labs.ucx] Task crashed. Execute `databricks workspace export /Users/first.last@databricks.com/.ucx/logs/assessment/run-67162835779538/estimate_table_size_for_migration.log` locally to troubleshoot with more details. [DELTA_TABLE_NOT_FOUND] Delta table `00_leone_retail`.`bd_test_tab1` doesn't exist.
traceback:
22:43:53 ERROR [databricks.labs.ucx] {MainThread} Task crashed. Execute `databricks workspace export /Users/douglas.moore@databricks.com/.ucx/logs/assessment/run-67162835779538/estimate_table_size_for_migration.log` locally to troubleshoot with more details. [DELTA_TABLE_NOT_FOUND] Delta table `00_leone_retail`.`bd_test_tab1` doesn't exist.
22:43:53 DEBUG [databricks] {MainThread} Task crash details
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/table_size.py", line 71, in _safe_get_table_size
return self._spark._jsparkSession.table(table_full_name).queryExecution().analyzed().stats().sizeInBytes()
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 194, in deco
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [DELTA_TABLE_NOT_FOUND] Delta table `00_leone_retail`.`bd_test_tab1` doesn't exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/tasks.py", line 179, in trigger
current_task.fn(cfg)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py", line 68, in estimate_table_size_for_migration
table_size.snapshot()
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/table_size.py", line 66, in snapshot
return self._snapshot(partial(self._try_load), partial(self._crawl))
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/crawlers.py", line 283, in _snapshot
loaded_records = list(loader())
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/table_size.py", line 45, in _crawl
size_in_bytes = self._safe_get_table_size(table.key)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/table_size.py", line 76, in _safe_get_table_size
raise RuntimeError(str(e)) from e
RuntimeError: [DELTA_TABLE_NOT_FOUND] Delta table `00_leone_retail`.`bd_test_tab1` doesn't exist.
@FastLee @mwojtyczka ^^
just happened during the demo:
we are not crawling shares as part of TableSizeCrawler task. DELTA_TABLE_NOT_FOUND seems to be another type of exception we need to catch apart from TABLE_OR_VIEW_NOT_FOUND
@mwojtyczka users can create tables with USING deltaSharing
CREATE TABLE hive_metastore.default.index_reports (
)
USING deltaSharing
LOCATION 'dbfs:/tmp/pt_config.share%23price-transparency-workshop.pt_stage.index_reports'
Must assume any table can be 'broken' and will throw an exception when describe is used.
While describe
type commands throw exceptions on malformed tables, show create
does not...
The newly implemented
estimate_table_size_for_migration
will propagate errors and crash the entire task via uncatchable Py4JJavaError. Logically, delta shares should be skipped when crawling for size estimates.Advice: all table level calls: