databrickslabs / mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
https://databrickslabs.github.io/mosaic/
Other
269 stars 66 forks source link

Error encountered when using SparkConnect with Mosaic #418

Closed sastafford closed 1 year ago

sastafford commented 1 year ago

Describe the bug

Running into a bug when trying to call enable_mosaic() when going through SparkConnect. Utilizing DatabricksConnect 13.2 and DBR 13.2.

Traceback (most recent call last):
  File "/Users/scott.stafford/Workspaces/lakehouse_playground/connect/geo.py", line 13, in <module>
    enable_mosaic(spark)
  File "/Users/scott.stafford/.local/share/virtualenvs/lakehouse_playground-iGlq0u6R/lib/python3.10/site-packages/mosaic/api/enable.py", line 47, in enable_mosaic
    _ = MosaicLibraryHandler(config.mosaic_spark)
  File "/Users/scott.stafford/.local/share/virtualenvs/lakehouse_playground-iGlq0u6R/lib/python3.10/site-packages/mosaic/core/library_handler.py", line 17, in __init__
    self.sc = spark.sparkContext
  File "/Users/scott.stafford/.local/share/virtualenvs/lakehouse_playground-iGlq0u6R/lib/python3.10/site-packages/pyspark/sql/connect/session.py", line 598, in __getattr__
    raise PySparkNotImplementedError(
pyspark.errors.exceptions.base.PySparkNotImplementedError: [NOT_IMPLEMENTED] sparkContext() is not implemented.

To Reproduce Steps to reproduce the behavior:

  1. Set up DatabricksConnect on your local laptop.
  2. Run the following code snippet
# Specify a Databricks configuration profile and
# the cluster_id field separately:
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
from mosaic import enable_mosaic

config = Config(
  profile    = "e2",
  cluster_id = "0707-182032-2lhgqobn"
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()
enable_mosaic(spark)

Expected behavior No errors

Screenshots If applicable, add screenshots to help explain your problem.

Additional context SparkConnect does not support a SparkContext.

mjohns-databricks commented 1 year ago

@sastafford we are working on SparkConnect under the next series of Mosaic. Also, 0.3 series stops at DBR 12.2 LTS currently, see docs. I will hit you up offline for more.