Closed sreelakshmiRajula closed 9 years ago
Seems your Hadoop cluster was corrupted. Can you confirm your HDFS health via http://
Hi, My Hadoop cluster is fine. The issue is only with indexing. I am able to run other tests as well. I am monitoring the dfs health also. And for indexing , the test runs successfully for 100000 pages. The exception occurs, when the number of pages are increased. And before running the test, I am clearing all files from datanode, and formatting namenode as well. Still the exception occurs.
Seems that's a problem caused by access HDFS concurrently, some task clear the file on exit while others still need to access. Maybe this is a bug of nutchindexing which included in hibench.
The nutchindexing included in HiBench is an earlier version(apache-nutch-1.2), while the newest is 1.10 for 1.x branch, and they also have 2.x branch. However, the newer versions were more tightly coupled for hibench to integrate as a benchmark workload. We've tried to upgrade the version of nutchindexing, However, nutch-1.2 is the best we could get now...
I am using nutch-1.2. And I face this issue with hadoop setup on two PCs. I do not face this issue with hadoop single PC setup.
So have you encounter the similar issue with nutch-1.2? Basically HiBench just setup a minimum system for you. It should have no different with manually execution.
And what's the hadoop version & distribution you use?
Yes, I am facing the issue with nutch-1.2. And it works fine with single PC setup. I am using hadoop-2.7.0 Compiled by jenkins on 2015-04-10T18:40Z Compiled with protoc 2.5.0
If you encounter the same issue with your manually deployed nutch-1.2, I think you should issue a bug in nutch's developer list. Maybe they'll have some workaround solution or even hotfix.
ok , I will do that . Thank you.
Hi,
I am trying to run HiBench for Nutch indexing. When I try to generate the data for 2Million pages, I get the rollowing error after Map 100% and reduce 100%.If anyone has faced similar issue, please suggest me how I can solve this.
I have hadoop setup with 2 PCs. Please find the below logs generated
INFO mapreduce.Job: map 100% reduce 100% 15/06/03 15:10:50 INFO mapreduce.Job: Task Id : attempt_1433322917078_0002_r_000039_0, Status : FAILED Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /HiBench/Nutch/Input/segments/20150603144633/parse_text/part-00039/data (inode 17350): File does not exist. Holder DFSClient_attempt_1433322917078_0002_r_000039_0_2042486051_1 does not have any open files. ................................................................. 15/06/03 15:11:12 INFO mapreduce.Job: Job job_1433322917078_0002 completed successfully 15/06/03 15:11:12 INFO mapreduce.Job: Counters: 50
Thanks in advance.