Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

Databricks Spark 2.4: java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/SupportsCatalogOptions #111

Closed dynarch closed 1 year ago

dynarch commented 1 year ago

I have installed a jar library on a Databricks cluster and during read I cannot use CDM connector any more. Using this line of code entity_df = (spark.read.format("com.microsoft.cdm") .option("storage", cdsStorageAccountName) .option("manifestPath", cdsContainer + manifest_path) .option("entity", table_name) .load()) display(entity_df)

throws an error: _java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/SupportsCatalogOptions Py4JJavaError Traceback (most recent call last)

in 2 .option("storage", cdsStorageAccountName) 3 .option("manifestPath", cdsContainer + manifest_path) ----> 4 .option("entity", table_name) 5 .load()) 6 display(entity_df)_ When using a previous version 0.19.1 it worked with no error. Databricks cluster configuration: Apache Spark 2.4.5, Scala 2.11 spark.databricks.passthrough.enabled true spark.databricks.delta.preview.enabled true 2-8 Workers 28-112 GB Memory 8-32 Cores 1 Driver14 GB Memory, 4 Cores Runtime 6.4
kecheung commented 1 year ago

I think you can check #107. You are using a spark 3 built jar on a spark 2 cluster and the said classes don't exist as expected.