databrickslabs / mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
https://databrickslabs.github.io/mosaic/
Other
280 stars 67 forks source link

`enable_mosaic` incompatibility with Unity Catalog #359

Open khalid-dev opened 1 year ago

khalid-dev commented 1 year ago

Bug Description:

Calling enable_mosaic(spark, dbutils) on a UC-enabled cluster with Shared Access Mode throws an error due to this method: https://github.com/databrickslabs/mosaic/blob/5acbc2eeeb93a543c4bc978a381979c4ad44e2c9/python/mosaic/core/library_handler.py#L18

Steps to Reproduce: 1) Create a Databricks Cluster with Access Mode set to Shared 1) Attach a notebook to cluster 1) Install mosaic via Cluster Libraries or %pip install databricks-mosaic in a notebook cell 1) Import & call enable_mosaic(spark, dbutils) 1) Should see the error in Description/Screenshots

Expected behavior:

Screenshots:

Additional context:

edurdevic commented 1 year ago

Thank you for reporting this @khalid-dev! Shared access clusters only support Python and SQL languages. Mosaic is written in Scala, with Python and SQL bindings. So when you install it in a shared access cluster it actually does not work because it is trying to execute Scala calls. We are working on white listing it, but for now you need to use "Assigned" access mode with Unity Catalog.

khalid-dev commented 1 year ago

Of course @edurdevic! Appreciate you clarifying the Scala issue - I definitely overlooked that possibility. Curious for my own understanding: "white listing it" = white listing Scala? Or some methods needed for the library?

I also have more details after trying out "Assigned" access mode. I think I've narrowed the issue down to a DLT-UC-specific incompatibility:

Ideally, we'd like to leverage mosaic's geospatial capabilities in a Delta Live pipelines with Unity Catalog. Seems that the DLT + UC combo restricts us to Shared mode. My current work around is using a Job to: 1) "Catalog dance" appropriate tables off Unity Catalog 1) Invoke mosaic for geospatial processing on a compatible cluster 1) Write results back to Unity Catalog location Screen Shot 2023-05-05 at 12 55 07 PM

Hope this info is helpful - thanks again for maintaining a great library 🥳

kyleries commented 1 year ago

@khalid-dev - thanks for the detailed write-up of this issue. I recently ran into the exact same issue. The catalog dance workaround is a creative idea. If we come up with any alternatives, I'll drop a note here. Otherwise, we're keen to see this resolved at a lower level in the stack as our experience w/ Mosaic so far has been very favorable.