delta-io / delta-sharing

An open protocol for secure data sharing
https://delta.io/sharing
Apache License 2.0
757 stars 171 forks source link

Getting "java.lang.NoSuchMethodError: 'java.lang.String org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.ROW_INDEX_TEMPORARY_COLUMN_NAME()'" #520

Open praveenkumarb1207 opened 3 months ago

praveenkumarb1207 commented 3 months ago

Hello Everyone,

I am trying to access a dataset that is delta shared using open sharing and I am encountering the error: java.lang.NoSuchMethodError: 'java.lang.String org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.ROW_INDEX_TEMPORARY_COLUMN_NAME()'.

I have followed the instructions provided in the README file. I have also tried with latest version delta sharing package available i.e 3.2.0.

spark-shell:

 spark-shell --packages "io.delta:delta-sharing-spark_2.12:3.1.0"

Code:

val tablePath = "file:///home/user/Downloads/config.share#test.default.department"
val df = spark.read.format("deltaSharing").load(tablePath)

Error :

java.lang.NoSuchMethodError: 'java.lang.String org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.ROW_INDEX_TEMPORARY_COLUMN_NAME()'
  at org.apache.spark.sql.delta.DeltaColumnMappingBase.$init$(DeltaColumnMapping.scala:73)
  at org.apache.spark.sql.delta.DeltaColumnMapping$.<init>(DeltaColumnMapping.scala:768)
  at org.apache.spark.sql.delta.DeltaColumnMapping$.<clinit>(DeltaColumnMapping.scala)
  at io.delta.sharing.spark.DeltaSharingDataSource.getHadoopFsRelationForDeltaSnapshotQuery(DeltaSharingDataSource.scala:413)
  at io.delta.sharing.spark.DeltaSharingDataSource.autoResolveBaseRelationForSnapshotQuery(DeltaSharingDataSource.scala:369)
  at io.delta.sharing.spark.DeltaSharingDataSource.createRelation(DeltaSharingDataSource.scala:237)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
  ... 47 elided
avivunitq commented 3 weeks ago

I dug into this and it seems the function in question was only added in Spark 3.4.0. Looks like the Delta Sharing docs need to be updated to reflect this (the README only specifies Apache Spark 3+).

I updated my spark dependencies to latest (3.5.2) and this fixed the issue.

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.5.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.5.2</version>
    </dependency>