Open marek-babic opened 3 years ago
Hi there
I'm using this package io.github.spark-redshift-community:spark-redshift_2.12:4.2.0 as a dependency in the context of AWS EMR job trying to save a dataframe to Redshift.
Sadly this attempt fails with following stacktrace: https://gist.github.com/marek-babic/0110160bdd0ba11533b6f425559d2f1c
I know that the dataframe is in healthy state as show() and printSchema() output what I expect and the schema matches the one from Redshift table.
The code looks like so (where the capital letter vars are set appropriately):
df.write \ .format("io.github.spark_redshift_community.spark.redshift") \ .option("url", "jdbc:redshift://" + HOST_URL + ":5439/" + DATABASE_NAME) \ .option("user", USERNAME) \ .option("password", PASSWORD) \ .option("dbtable", TABLE_NAME) \ .option("aws_region", REGION) \ .option("aws_iam_role", IAM_ROLE) \ .option("tempdir", TMP_PATH) \ .option("tempformat", "CSV") \ .mode("overwrite") \ .save()
I tried to save the dataframe to S3 just by running:
df.write.format("csv").save(TMP_PATH + "/test1")
which worked, so the permissions in AWS are correct.
Any ideas why this could be happening? Thanks Marek
Any solutions on this?
Hi there
I'm using this package io.github.spark-redshift-community:spark-redshift_2.12:4.2.0 as a dependency in the context of AWS EMR job trying to save a dataframe to Redshift.
Sadly this attempt fails with following stacktrace: https://gist.github.com/marek-babic/0110160bdd0ba11533b6f425559d2f1c
I know that the dataframe is in healthy state as show() and printSchema() output what I expect and the schema matches the one from Redshift table.
The code looks like so (where the capital letter vars are set appropriately):
I tried to save the dataframe to S3 just by running:
which worked, so the permissions in AWS are correct.
Any ideas why this could be happening? Thanks Marek