Big-Data-Manning / big-data-code

Source code for Big Data: Principles and best practices of scalable realtime data systems
332 stars 163 forks source link

Batch Layer Code Fails on Hadoop 2.7.0, Linux Ubuntu Trusty 14-04 #1

Open nagarajanchinnasamy opened 9 years ago

nagarajanchinnasamy commented 9 years ago

Am facing issues in setting up Batch Layer successfully.

I introduced a main function in BatchWorkflow.java of manning.batchlayer package as follows:

{
    initTestData();
    batchWorkflow();
}

When I run this under hadoop (with 1 namenode and 2 datanodes), it fails at:

batchWorkflow()->ingest()->appendNewDataToMasterDataPail()->shred()->Api.execute()

with following output + exception:

15/07/17 10:16:20 INFO flow.FlowStep: [] submitted hadoop job: job_1437122465719                                                                                        _0010
15/07/17 10:17:19 WARN flow.FlowStep: [] task completion events identify failed tasks
15/07/17 10:17:19 WARN flow.FlowStep: [] task completion events count: 5
15/07/17 10:17:19 WARN flow.FlowStep: [] event = Task Id : attempt_1437122465719_0010_m_000000_0, Status : SUCCEEDED
15/07/17 10:17:19 WARN flow.FlowStep: [] event = Task Id : attempt_1437122465719_0010_r_000000_0, Status : FAILED
15/07/17 10:17:19 WARN flow.FlowStep: [] event = Task Id : attempt_1437122465719_0010_r_000000_1, Status : FAILED
15/07/17 10:17:19 WARN flow.FlowStep: [] event = Task Id : attempt_1437122465719_0010_r_000000_2, Status : FAILED
15/07/17 10:17:19 WARN flow.FlowStep: [] event = Task Id : attempt_1437122465719_0010_r_000000_3, Status : TIPFAILED
15/07/17 10:17:19 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/17 10:17:19 INFO flow.Flow: [] stopping all jobs
15/07/17 10:17:19 INFO flow.FlowStep: [] stopping: (1/1) /tmp/swa/shredded
15/07/17 10:17:19 INFO impl.YarnClientImpl: Killed application application_1437122465719_0010
15/07/17 10:17:19 INFO flow.Flow: [] stopped all jobs
15/07/17 10:17:19 INFO util.Hadoop18TapUtil: deleting temp path /tmp/swa/shredded/_temporary
Exception in thread "main" cascading.flow.FlowException: step failed: (1/1) /tmp/swa/shredded, with job id: job_1437122465719_0010, please see cluster logs for failure messages
        at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:193)
        at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
        at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
        at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Any input to fix the issue will be appreciated. Thanks.