aalkilani / spark-kafka-cassandra-applying-lambda-architecture

Other
64 stars 52 forks source link

Error when saving to hdfs #15

Closed aasheikh closed 7 years ago

aasheikh commented 7 years ago

Hey,

I wonder why The following exception is thrown when executing the following line. I can browse hdfs from the named node UI http://lambda-pluralsight:50070 activityByProduct.write.partitionBy("timestamp_hour").mode(SaveMode.Append).parquet("hdfs://lambda-pluralsight:9000/lambda/batch1")

17/01/28 21:52:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 17/01/28 21:52:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.2 KB, free 125.8 KB) 17/01/28 21:52:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:10795 (size: 18.2 KB, free: 2.4 GB) 17/01/28 21:52:22 INFO SparkContext: Created broadcast 0 from textFile at BatchJob.scala:36 17/01/28 21:52:25 INFO FileInputFormat: Total input paths to process : 1 Exception in thread "main" java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "asheikh-QAL51/127.0.0.1"; destination host is: "lambda-pluralsight":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:73) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:329) at batch.BatchJob$.main(BatchJob.scala:67) at batch.BatchJob.main(BatchJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891) 17/01/28 21:52:25 INFO SparkContext: Invoking stop() from shutdown hook 17/01/28 21:52:25 INFO SparkUI: Stopped Spark web UI at http://10.20.6.174:4041 17/01/28 21:52:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/01/28 21:52:25 INFO MemoryStore: MemoryStore cleared 17/01/28 21:52:25 INFO BlockManager: BlockManager stopped 17/01/28 21:52:25 INFO BlockManagerMaster: BlockManagerMaster stopped 17/01/28 21:52:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/01/28 21:52:25 INFO SparkContext: Successfully stopped SparkContext 17/01/28 21:52:25 INFO ShutdownHookManager: Shutdown hook called 17/01/28 21:52:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-e725e6ec-9bf0-4151-8624-9e4e588b532c

aalkilani commented 7 years ago

@aasheikh , I saw this come up the other day but haven't had a chance to try and replicate. Was there something in particular that solved this for you just so we can keep a reference here? Thanks!