audienceproject / spark-dynamodb

Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.
Apache License 2.0
175 stars 90 forks source link

Read from dynamoDb with spark throwing java.lang.AbstractMethodError #78

Closed asutoshparida closed 4 years ago

asutoshparida commented 4 years ago

Hi, we are trying to read from dynamodb using spark. I am using spark 2.3.3 , scala 2.11.11 , hadoop 3.2.0 & `

com.audienceproject spark-dynamodb_2.11 1.0.2

`

Below's the sample code

val spark = SparkSession.builder.appName("SparkDynamoDBExample").master("local[4]").getOrCreate var dynamoDf = spark.read.dynamodb("xcloud.test") dynamoDf.count() `

But it's throwing Exception in thread "main" java.lang.AbstractMethodError: com.audienceproject.spark.dynamodb.datasource.DynamoDataSourceReader.createDataReaderFactories()Ljava/util/List; at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.readerFactories$lzycompute(DataSourceV2ScanExec.scala:55) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.readerFactories(DataSourceV2ScanExec.scala:52) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:76) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:150) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:610) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)

Please suggest. Thanks

fschueler commented 4 years ago

Hi, we ran into the same issue with Scala 2.11.8 and Spark 2.3.2.

As a workaround we downgraded to version 0.4.4

asutoshparida commented 4 years ago

Hi, Now facing the old issue,

https://github.com/audienceproject/spark-dynamodb/issues/29

asutoshparida commented 4 years ago

Thanks @fschueler for your help.Finally I am able to solve the issue by using below configuration Spark 2.3.3 Scala 2.11.12 spark-dynamodb_2.11 0.4.4 guva 14.0.1

It even works on EMR emr-5.22.0 without any issue.