Open ramanathanramaiyah opened 7 years ago
getting the same thing with
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.224" too.
Any progress on this?
This occurs mostly because of dependency issue. Both hadoop-aws and aws-sdk have to be compatible.
@vnktsh Where can I find which version is compatible with which one? Shouldn't the latest builds of both be compatible?
I have the following build.sbt
version := "1.0"
scalaVersion := "2.11.8"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.2.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided"
// https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.8.2"
// https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-redshift
libraryDependencies += "com.amazonaws" % "aws-java-sdk-redshift" % "1.11.225"
// https://mvnrepository.com/artifact/com.databricks/spark-avro_2.11
libraryDependencies += "com.databricks" % "spark-avro_2.11" % "4.0.0"
// https://mvnrepository.com/artifact/com.databricks/spark-redshift_2.11
libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1"
// https://mvnrepository.com/artifact/com.eclipsesource.minimal-json/minimal-json
libraryDependencies += "com.eclipsesource.minimal-json" % "minimal-json" % "0.9.4"
// https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector
libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.11" % "2.2.0"
// https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver
libraryDependencies += "org.mongodb" % "mongo-java-driver" % "3.4.3"
// https://mvnrepository.com/artifact/org.mongodb.mongo-hadoop/mongo-hadoop-spark
libraryDependencies += "org.mongodb.mongo-hadoop" % "mongo-hadoop-spark" % "2.0.2"
@vetional : Try with hadoop 2.7.3, Don't include aws-sdk-core explicitly, hadoop-aws has compile dependency. Use following mvn template(including exclusions) to adapt for you sbt.
I would start by minimizing the code until problem solves, probably remove mongo related dependencies, include redhisft jdbc either as jar or as dependency in sbt.
TIP: Always check mvn repo website to see the compiled dependencies and exclude duplicate versions if it's in conflict with other dependencies in your sbt/pom.xml
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.7.3</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet.jsp</groupId>
<artifactId>jsp-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
Your final sbt should look something like this:
version := "1.0" scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0" libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.2.0" libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided" libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" with all the exclusions from above template. libraryDependencies += "com.databricks" % "spark-avro_2.11" % "4.0.0" libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1" libraryDependencies += "com.eclipsesource.minimal-json" % "minimal-json" % "0.9.4" //libraryDependencies += possible dependency for redshift jdbc ...
Redshift instance and S3 bucket are in ap-south-1. Simply reading a file from S3 and writing it to Redshift. Here is the code:
--Create spark context sc sc.hadoopConfiguration.set("fs.s3a.access.key", "<<>>") sc.hadoopConfiguration.set("fs.s3a.secret.key", "<<>>")
val df = <>;
df.write.format("com.databricks.spark.redshift").option("url", "?user=<<>>&password=<<>>").option("dbtable", "public.xxx").option("tempdir", "s3a://bucketname/folder").mode(SaveMode.Append).save()
SBT dependency:
scalaVersion := "2.10.5" libraryDependencies += "com.databricks" %% "spark-redshift" % "1.1.0" libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.210" libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.210"
Adding Redshift JDBC jar as --jars option in spark-submit.
Error:
WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration java.lang.IllegalArgumentException: Cannot create enum from ap-south-1 value! at com.amazonaws.regions.Regions.fromName(Regions.java:71)