databricks / spark-redshift

Redshift data source for Apache Spark
Apache License 2.0
605 stars 349 forks source link

Error when saving a dataframe to Redshift (java.lang.ArrayStoreException: java.lang.invoke.SerializedLambda) #459

Open marek-babic opened 3 years ago

marek-babic commented 3 years ago

Hi there

I'm using this package io.github.spark-redshift-community:spark-redshift_2.12:4.2.0 as a dependency in the context of AWS EMR job trying to save a dataframe to Redshift.

Sadly this attempt fails with following stacktrace: https://gist.github.com/marek-babic/0110160bdd0ba11533b6f425559d2f1c

I know that the dataframe is in healthy state as show() and printSchema() output what I expect and the schema matches the one from Redshift table.

The code looks like so (where the capital letter vars are set appropriately):

df.write \
  .format("io.github.spark_redshift_community.spark.redshift") \
  .option("url", "jdbc:redshift://" + HOST_URL + ":5439/" + DATABASE_NAME) \
  .option("user", USERNAME) \
  .option("password", PASSWORD) \
  .option("dbtable", TABLE_NAME) \
  .option("aws_region", REGION) \
  .option("aws_iam_role", IAM_ROLE) \
  .option("tempdir", TMP_PATH) \
  .option("tempformat", "CSV") \
  .mode("overwrite") \
  .save()

I tried to save the dataframe to S3 just by running:

df.write.format("csv").save(TMP_PATH + "/test1")

which worked, so the permissions in AWS are correct.

Any ideas why this could be happening? Thanks Marek

SaravShah commented 3 months ago

Any solutions on this?