Getting error while writing data to Redshift. S3 bucket lifecycle configuration, java.lang.IllegalArgumentException: Cannot create enum from ap-south-1 value! #368

Open ramanathanramaiyah opened 7 years ago

ramanathanramaiyah commented 7 years ago

Redshift instance and S3 bucket are in ap-south-1. Simply reading a file from S3 and writing it to Redshift. Here is the code:

--Create spark context sc sc.hadoopConfiguration.set("fs.s3a.access.key", "<<>>") sc.hadoopConfiguration.set("fs.s3a.secret.key", "<<>>")

val df = <>; df.write.format("com.databricks.spark.redshift").option("url", "?user=<<>>&password=<<>>").option("dbtable", "").option("tempdir", "s3a://bucketname/folder").mode(SaveMode.Append).save()

SBT dependency:

scalaVersion := "2.10.5" libraryDependencies += "com.databricks" %% "spark-redshift" % "1.1.0" libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.210" libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.210"

Adding Redshift JDBC jar as --jars option in spark-submit.


WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration java.lang.IllegalArgumentException: Cannot create enum from ap-south-1 value! at com.amazonaws.regions.Regions.fromName(

vetional commented 7 years ago

getting the same thing with

libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.224" too.

Any progress on this?

vnktsh commented 7 years ago

This occurs mostly because of dependency issue. Both hadoop-aws and aws-sdk have to be compatible.

vetional commented 7 years ago

@vnktsh Where can I find which version is compatible with which one? Shouldn't the latest builds of both be compatible?

I have the following build.sbt

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0"

libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.2.0"

libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided"

libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.8.2"

libraryDependencies += "com.amazonaws" % "aws-java-sdk-redshift" % "1.11.225"

libraryDependencies += "com.databricks" % "spark-avro_2.11" % "4.0.0"

libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1"

libraryDependencies += "com.eclipsesource.minimal-json" % "minimal-json" % "0.9.4"

libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.11" % "2.2.0"

libraryDependencies += "org.mongodb" % "mongo-java-driver" % "3.4.3"

libraryDependencies += "org.mongodb.mongo-hadoop" % "mongo-hadoop-spark" % "2.0.2"
vnktsh commented 7 years ago

@vetional : Try with hadoop 2.7.3, Don't include aws-sdk-core explicitly, hadoop-aws has compile dependency. Use following mvn template(including exclusions) to adapt for you sbt.

I would start by minimizing the code until problem solves, probably remove mongo related dependencies, include redhisft jdbc either as jar or as dependency in sbt.

TIP: Always check mvn repo website to see the compiled dependencies and exclude duplicate versions if it's in conflict with other dependencies in your sbt/pom.xml


Your final sbt should look something like this:

version := "1.0" scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0" libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.2.0" libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided" libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" with all the exclusions from above template. libraryDependencies += "com.databricks" % "spark-avro_2.11" % "4.0.0" libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1" libraryDependencies += "com.eclipsesource.minimal-json" % "minimal-json" % "0.9.4" //libraryDependencies += possible dependency for redshift jdbc ...