I have been getting the following error once in a few hours after running my spark streaming application that is writing into redshift -
16/03/31 02:13:54 ERROR JobScheduler: Error running job streaming job 1459390400000 ms.1
java.sql.SQLException: Amazon Invalid operation: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid 5E230F743BE2BB84,ExtRid Q5XC1Qy2dn7G4jiSL5r80ZMDFJL16oYd6iDDMGDTucCPySaJVgHnexDtAC4r286i,CanRetry 1
Details:
error: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid 5E230F743BE2BB84,ExtRid Q5XC1Qy2dn7G4jiSL5r80ZMDFJL16oYd6iDDMGDTucCPySaJVgHnexDtAC4r286i,CanRetry 1
code: 8001
context: S3 key being read : s3://yada/spark_files/agged_log/2016/03/31/02/9c0173ff-7630-4a46-8dc6-86ff227610cb/part-r-00007-ea400618-15e7-4f64-af7e-490c2792258c.avro
query: 183119
location: table_s3_scanner.cpp:353
process: query0_24 [pid=12849]
-----------------------------------------------;
at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(ErrorResponse.java:1830)
at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(PGMessagingContext.java:804)
at com.amazon.redshift.client.PGMessagingContext.handleMessage(PGMessagingContext.java:642)
at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(InboundMessagesPipeline.java:312)
at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(PGMessagingContext.java:1062)
at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(PGMessagingContext.java:1030)
at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(PGClient.java:2417)
at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(PGClient.java:2358)
at com.amazon.redshift.client.PGClient.executePreparedStatement(PGClient.java:1358)
at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(PGQueryExecutor.java:370)
at com.amazon.redshift.dataengine.PGQueryExecutor.execute(PGQueryExecutor.java:245)
at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source)
at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source)
at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:122)
at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:122)
at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:140)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
this is the code that I'm running:
dfrdd.write
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift:/yadayada")
.option("dbtable", "aggregated_logs_5")
.option("tempdir", "s3n://yada/"+DateTime.now().toString(logS3TempFolderformatter) )
.option("extracopyoptions", "TRUNCATECOLUMNS" )
.mode(SaveMode.Append)
.save()
is it somthing that can be solved with configuration changes?
I have been getting the following error once in a few hours after running my spark streaming application that is writing into redshift - 16/03/31 02:13:54 ERROR JobScheduler: Error running job streaming job 1459390400000 ms.1 java.sql.SQLException: Amazon Invalid operation: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid 5E230F743BE2BB84,ExtRid Q5XC1Qy2dn7G4jiSL5r80ZMDFJL16oYd6iDDMGDTucCPySaJVgHnexDtAC4r286i,CanRetry 1 Details:
error: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid 5E230F743BE2BB84,ExtRid Q5XC1Qy2dn7G4jiSL5r80ZMDFJL16oYd6iDDMGDTucCPySaJVgHnexDtAC4r286i,CanRetry 1 code: 8001 context: S3 key being read : s3://yada/spark_files/agged_log/2016/03/31/02/9c0173ff-7630-4a46-8dc6-86ff227610cb/part-r-00007-ea400618-15e7-4f64-af7e-490c2792258c.avro query: 183119 location: table_s3_scanner.cpp:353 process: query0_24 [pid=12849] -----------------------------------------------; at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(ErrorResponse.java:1830) at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(PGMessagingContext.java:804) at com.amazon.redshift.client.PGMessagingContext.handleMessage(PGMessagingContext.java:642) at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(InboundMessagesPipeline.java:312) at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(PGMessagingContext.java:1062) at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(PGMessagingContext.java:1030) at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(PGClient.java:2417) at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(PGClient.java:2358) at com.amazon.redshift.client.PGClient.executePreparedStatement(PGClient.java:1358) at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(PGQueryExecutor.java:370) at com.amazon.redshift.dataengine.PGQueryExecutor.execute(PGQueryExecutor.java:245) at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source) at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source) at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:122) at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:122) at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:140) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
this is the code that I'm running: dfrdd.write .format("com.databricks.spark.redshift") .option("url", "jdbc:redshift:/yadayada") .option("dbtable", "aggregated_logs_5") .option("tempdir", "s3n://yada/"+DateTime.now().toString(logS3TempFolderformatter) ) .option("extracopyoptions", "TRUNCATECOLUMNS" ) .mode(SaveMode.Append)
.save()
is it somthing that can be solved with configuration changes?