Open khalid-dev opened 1 year ago
Thank you for reporting this @khalid-dev! Shared access clusters only support Python and SQL languages. Mosaic is written in Scala, with Python and SQL bindings. So when you install it in a shared access cluster it actually does not work because it is trying to execute Scala calls. We are working on white listing it, but for now you need to use "Assigned" access mode with Unity Catalog.
Of course @edurdevic! Appreciate you clarifying the Scala issue - I definitely overlooked that possibility. Curious for my own understanding: "white listing it" = white listing Scala? Or some methods needed for the library?
I also have more details after trying out "Assigned" access mode. I think I've narrowed the issue down to a DLT-UC-specific incompatibility:
Ideally, we'd like to leverage mosaic's geospatial capabilities in a Delta Live pipelines with Unity Catalog. Seems that the DLT + UC combo restricts us to Shared
mode. My current work around is using a Job to:
1) "Catalog dance" appropriate tables off Unity Catalog
1) Invoke mosaic for geospatial processing on a compatible cluster
1) Write results back to Unity Catalog location
Hope this info is helpful - thanks again for maintaining a great library 🥳
@khalid-dev - thanks for the detailed write-up of this issue. I recently ran into the exact same issue. The catalog dance workaround is a creative idea. If we come up with any alternatives, I'll drop a note here. Otherwise, we're keen to see this resolved at a lower level in the stack as our experience w/ Mosaic so far has been very favorable.
Bug Description:
Calling
enable_mosaic(spark, dbutils)
on a UC-enabled cluster withShared
Access Mode throws an error due to this method: https://github.com/databrickslabs/mosaic/blob/5acbc2eeeb93a543c4bc978a381979c4ad44e2c9/python/mosaic/core/library_handler.py#L18Shared
, enabling interaction with Unity Catalog tablesenable_mosaic(spark, dbutils)
on this cluster gives a stack-trace to this line & throws apy4j.security.Py4JSecurityException
:Steps to Reproduce: 1) Create a Databricks Cluster with Access Mode set to
Shared
1) Attach a notebook to cluster 1) Installmosaic
via Cluster Libraries or%pip install databricks-mosaic
in a notebook cell 1) Import & callenable_mosaic(spark, dbutils)
1) Should see the error in Description/ScreenshotsExpected behavior:
enable_mosaic
on aShared
Access Mode Databricks Cluster shouldn't throw an errorScreenshots:
Additional context:
mosaic
internally call other methods that also aren't allow-listed when using Unity Catalog?