awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
217 stars 135 forks source link

Does this connector work with Spark 3.3 and EMR 6.11 #191

Open SravaniMaddala opened 10 months ago

SravaniMaddala commented 10 months ago

I am trying to use this connector with Spark 3.3 and EMR 6.11 but the RDD is always empty. Wondering if there is any version mismatch that is causing this or if I am missing something.

SparkSession sparkSession = SparkSession.builder().getOrCreate();
SparkContext sparkContext = sparkSession.sparkContext();

JavaSparkContext sc = new JavaSparkContext(sparkContext)

JobConf jobConf = new JobConf(sc.hadoopConfiguration());
jobConf.set("dynamodb.input.tableName", "test-dynamo");
jobConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat");
jobConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat");

JavaPairRDD<Text, DynamoDBItemWritable> rows = sc.hadoopRDD(jobConf, DynamoDBInputFormat.class,Text.class, DynamoDBItemWritable.class);

rows.count();
custommonkey commented 8 months ago

The unreleased master does seem to work with 3.3. I had the same experience with latest 4.16 release not working against 3.3.