Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
199 stars 119 forks source link

No Such Method Error for Spark 3.5.0 (Databricks 14.3 LTS) #487

Closed RamonRay closed 5 months ago

RamonRay commented 5 months ago

When running the following pyspark script in databricks 14.3 LTS (Spark 3.5.0):

schema= StructType([
    StructField("id", StringType(), False),
    StructField("some_other_field", StringType(), False)
])
(spark.read
    .format("cosmos.oltp")
    .schema(schema)
    .options(**cfg)
    .load()
    .where("some_other_field = 'xxx'")
    .show()

I get the following exception

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
    at com.azure.cosmos.spark.RowSerializerPool$.getOrCreateSerializer(RowSerializerPool.scala:42)
    at com.azure.cosmos.spark.ItemsPartitionReader.<init>(ItemsPartitionReader.scala:259)
    at com.azure.cosmos.spark.ItemsScanPartitionReaderFactory.createReader(ItemsScanPartitionReaderFactory.scala:56)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:85)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:64)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
    at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$1(UnsafeRowBatchUtils.scala:68)

I saw that spark 3.5.0 is not yet supported. May I ask when it will be supported and have this issue fixed?

FabianMeiswinkel commented 5 months ago

Spark 3.5 support PR has just merged - see https://github.com/Azure/azure-sdk-for-java/pull/39395

It will be released within the next 2-3 weeks.