Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
199 stars 119 forks source link

Cannot install on Databricks 11.3 LTS (Spark 3.3.0 #480

Open lotsahelp opened 1 year ago

lotsahelp commented 1 year ago

I'm trying the azure-cosmos-spark_3-3_2-12 (v4.15.0) connector from maven and it never finishes installing via maven. I have tried downloading the jar from maven and installing manually. It takes a few minutes to upload / install, but I'm left with the below message each time I try to call Cosmos. Changing back to Databricks 10.4 LTS w/ the 3-2_2-12 connector works fine.

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-725919012830118> in <cell line: 2>()
      1 ##CREATE Container and Database
----> 2 spark.sql(f'CREATE DATABASE IF NOT EXISTS cosmosCatalog.{cosmosDatabaseName};')
      3 
      4 spark.sql(
      5     f"""

/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
     49                 logger.log_success(
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature

/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery, **kwargs)
   1117             sqlQuery = formatter.format(sqlQuery, **kwargs)
   1118         try:
-> 1119             return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1120         finally:
   1121             if len(kwargs) > 0:

/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-> 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    200                 # Hide where the exception came from that shows a non-Pythonic
    201                 # JVM exception message.
--> 202                 raise converted from None
    203             else:
    204                 raise

AnalysisException: Catalog 'cosmoscatalog' not found
FabianMeiswinkel commented 1 year ago

The error "Catalog 'cosmoscatalog' not found" indicates that the Spark Catalog with identifier "cosmoscatalog" has not been configured.

It would be done by adding the following entries to the Spark config spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog") spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", "") spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", "")

From the behavior you describe it might be possible that the Spark 3.2 cluster has these spark config settings defined in the cluster config - so would be applied at start-up - while the Spark 3.3 doesn't have those settings?

Thanks, Fabian

lotsahelp commented 1 year ago

@FabianMeiswinkel those three lines are in the cell above.