error saving dataframe to emr 9.0 (spark 2.2)

rsachar06 commented 6 years ago

error saving dataframe to emr 9.0 (spark 2.2) read works ok

rsachar06 commented 6 years ago

forgot to post the error please assist

eventsDF.write.format("com.databricks.spark.redshift").option("url", JdbcURL).option("dbtable", "public.dwdate2").option("user",RedshiftUserName).option("aws_iam_role", IamRoleUrl).option("password","Rs030629").option("tempdir", S3TempTable).save() 17/10/31 00:23:27 WARN Utils$: The S3 bucket dwh-mlp-test does not have an object lifecycle configuration to ensure cleanup of temporary files. Consider configuring tempdir to point to a bucket with an object lifecycle policy that automatically deletes files after an expiration period. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html 17/10/31 00:23:27 WARN Utils$: The S3 bucket dwh-mlp-test does not have an object lifecycle configuration to ensure cleanup of temporary files. Consider configuring tempdir to point to a bucket with an object lifecycle policy that automatically deletes files after an expiration period. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html [Stage 1:> (0 + 2) / 2]17/10/31 00:23:31 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, ip-10-226-41-36.ec2.internal, executor 1): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;

vnktsh commented 6 years ago

Did you debug? What are your findings?

databricks / spark-redshift

error saving dataframe to emr 9.0 (spark 2.2) #372