Version: trunk@464
If job has a series of DTRs and the input data to the first step changes then a
downstream "for" DTR will fail if its output directory is not manually cleared
before the job is run.
This behavior is (I think) new with revision r358. It prevents the dataflow
from working as I think it was intended.
The example hamake job can be used to reproduce this issue.
1. build trunk
2. cd dist/examples/class-size-median
3. export HADOOP_HOME=<whatever>
4. run the job using the script
bin/run.sh working
5. add any jar to the data directory
hadoop fs -put hamake-2.0b-4.jar working/data
6. export RUN_FOLDER=working
7. manually run the job:
hadoop jar hamake-2.0b-4.jar -f file:///${PWD}/hamakefiles/class-size.xml
The new jar is processed by the first two "foreach" DTRs, but, then the
histogram "for" DTR fails:
12/10/26 15:59:39 ERROR security.UserGroupInformation:
PriviledgedActionException as:jlent
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://localhost:8020/user/jlent/working/result/class-size-histogram already
exists
12/10/26 15:59:39 ERROR task.MapReduce: Failed to execute Hadoop command
hdfs://localhost:8020/user/jlent/working/hamake-examples-2.0b-4.jar/com.codemind
ers.hamake.examples.ClassSizeHistogram
java.lang.Exception: org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory
hdfs://localhost:8020/user/jlent/working/result/class-size-histogram already
exists
One additional problem is that if the job is run without first adding a jar to
the working directory then the inital DTR fails on the line:
Foreach.java
310 LOG.info(getName() + ": Completed " + fetcher.getCounter() + " tasks, " +
fetcher.getErrors() + " tasks with errors, average run time: " +
fetcher.getTotalRunTime() / (fetcher.getCounter() + fetcher.getErrors()) + "
ms");
because the denominator is zero. This is easy to fix. I just made it:
if (fetcher.getCounter() + fetcher.getErrors() > 0) {
LOG.info(getName() + ": Completed " + fetcher.getCounter() + " tasks, " + fetcher.getErrors() + " tasks with errors, average run time: " + fetcher.getTotalRunTime() / (fetcher.getCounter() + fetcher.getErrors()) + " ms");
}
else {
LOG.info("Output of " + getName() + " is already present and fresh");
}
Original issue reported on code.google.com by jwl...@gmail.com on 26 Oct 2012 at 8:12
Original issue reported on code.google.com by
jwl...@gmail.com
on 26 Oct 2012 at 8:12