audienceproject / spark-dynamodb

Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.
Apache License 2.0
175 stars 90 forks source link

Error when trying to write pyspark dataframe to DynamoDB #92

Open jcerquozzi opened 3 years ago

jcerquozzi commented 3 years ago

Hi,

I am trying to write a pyspark dataframe (that comes from a parquet file) to DynamoDB, but I am getting the following error:

AnalysisException: TableProvider implementation dynamodb cannot be written with ErrorIfExists mode, please use Append or Overwrite modes instead.;

The code I am using is:

df = sqlContext.read.parquet(path)

df.write.option("tableName", "dynamo_test") \
            .format("dynamodb") \
            .save()

I tried putting

df.write.option("tableName", "dynamo_test") \
                .format("dynamodb").mode("overwrite") \
                .save()

And got error:

AnalysisException: Table dynamo_test does not support truncate in batch mode.;;

rehevkor5 commented 3 years ago

I believe Append is the appropriate choice, try adding:

.mode(SaveMode.Append)

The example in the README is bad, for this. See also the method DynamoDBDataFrameWriter#dynamodb(tableName: String) in implicits.scala. You can see that it specifies SaveMode.Append.