HDFS目录 - Githubissues

sangzhe commented 9 years ago

general_config.json中需要配置hadoop_dr，这个路径应该是哪个呢？现在运行到copy文件至HDFS出错，我觉得可能是这个目录地址写的不对

图是hadoop的log

sangzhe commented 9 years ago

看来和这个没关系，这些文件已上传至HDFS了

baimingze commented 9 years ago

我看的有点不大明白，麻烦再提供一点背景信息？

sangzhe commented 9 years ago

这是以input.xml作为参数文件输入，单步执行程序发现是在执行第211行代码时出了错，下面这些信息就是调用copytFileToHDFS这个函数未正常执行抛出的。

[sangzhe@localhost mr-tandem]$ python mr-tandem.py general_config.json input.xml

15/07/12 03:25:43 unable to invoke local copy of hadoop for file transfer purposes 
15/07/12 03:25:43 global name 'lastlogtime' is not defined 
15/07/12 03:25:43 !!! 
15/07/12 03:25:43 Hadoop cluster access doesn't seem to be set up! 
15/07/12 03:25:43 !!! 
15/07/12 03:25:43 if you haven't already initiated a proxy to your hadoop gateway, open another shell and leave the following command running in it: 
15/07/12 03:25:43 "ssh -D 6789 -n -N <your_hadoop_username>@<your_hadoop_gateway>" 
15/07/12 03:25:43 see https://univsupport.hipods.ihost.com/documents/7/ for details on hadoop gateway proxies 
15/07/12 03:25:43 and you'll need to install hadoop on your local machine, to get the hadoop file transfer commands working. 
15/07/12 03:25:43 see http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html and http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/

但是查看HDFS下的文件，是已经被copy过来了

[sangzhe@localhost mr-tandem]$ hadoop fs -ls /home/sangzhe/hadoop/hadoop-tmp/input.xml_runs/20150712032540
Found 2 items
-rw-r--r--   1 sangzhe supergroup      10619 2015-07-12 11:25 /home/sangzhe/hadoop/hadoop-tmp/input.xml_runs/20150712032540/__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__default_input.xml
-rw-r--r--   1 sangzhe supergroup       1106 2015-07-12 11:25 /home/sangzhe/hadoop/hadoop-tmp/input.xml_runs/20150712032540/__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__input.xml

所以现在还没搞清楚是哪个地方出了问题

baimingze commented 9 years ago

能把详细运行过程描述一下么？我来重复一下

On Sun, 12 Jul 2015 14:51 SangZhe notifications@github.com wrote:

这是以input.xml作为参数文件输入，单布执行程序发现是在执行第211行代码时出了错，下面这些信息就是调用copytFileToHDFS这个函数未正常执行抛出的。 `[sangzhe@localhost mr-tandem]$ python mr-tandem.py general_config.json input.xml

15/07/12 03:25:43 unable to invoke local copy of hadoop for file transfer purposes 15/07/12 03:25:43 global name 'lastlogtime' is not defined 15/07/12 03:25:43 !!! 15/07/12 03:25:43 Hadoop cluster access doesn't seem to be set up! 15/07/12 03:25:43 !!! 15/07/12 03:25:43 if you haven't already initiated a proxy to your hadoop gateway, open another shell and leave the following command running in it: 15/07/12 03:25:43 "ssh -D 6789 -n -N @" 15/07/12 03:25:43 see https://univsupport.hipods.ihost.com/documents/7/ for details on hadoop gateway proxies 15/07/12 03:25:43 and you'll need to install hadoop on your local machine, to get the hadoop file transfer commands working. 15/07/12 03:25:43 see http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html and http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/` 但是查看HDFS下的文件，是已经被copy过来了 [sangzhe@localhost mr-tandem]$ hadoop fs -ls /home/sangzhe/hadoop/hadoop-tmp/input.xml_runs/20150712032540 Found 2 items -rw-r--r-- 1 sangzhe supergroup 10619 2015-07-12 11:25 /home/sangzhe/hadoop/hadoop-tmp/input.xml_runs/20150712032540/slhomeslsangzheslgithubslproject1slmrtandam-ica-codeslmr-tandemsldefault_input.xml -rw-r--r-- 1 sangzhe supergroup 1106 2015-07-12 11:25 /home/sangzhe/hadoop/hadoop-tmp/input.xml_runs/20150712032540/slhomeslsangzheslgithubslproject1slmrtandam-ica-codeslmr-tandemslinput.xml 所以现在还没搞清楚是哪个地方出了问题

— Reply to this email directly or view it on GitHub https://github.com/baimingze/project1/issues/4#issuecomment-120721810.

sangzhe commented 9 years ago

[sangzhe@localhost mr-tandem]$ python mr-tandem.py general_config.json input.xml

15/07/12 04:58:28 unable to invoke local copy of hadoop for file transfer purposes 
15/07/12 04:58:28 global name 'lastlogtime' is not defined 
15/07/12 04:58:28 !!! 
15/07/12 04:58:28 Hadoop cluster access doesn't seem to be set up! 
15/07/12 04:58:28 !!! 
15/07/12 04:58:28 if you haven't already initiated a proxy to your hadoop gateway, open another shell and leave the following command running in it: 
15/07/12 04:58:28 "ssh -D 6789 -n -N <your_hadoop_username>@<your_hadoop_gateway>" 
15/07/12 04:58:28 see https://univsupport.hipods.ihost.com/documents/7/ for details on hadoop gateway proxies 
15/07/12 04:58:28 and you'll need to install hadoop on your local machine, to get the hadoop file transfer commands working. 
15/07/12 04:58:28 see http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html and http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/

用tandem_params.xml作为输入文件也是一样的

sangzhe commented 9 years ago

再次用tandem_params.xml运行，又报了不同的错，修正了两个路径以后又往下执行到了683行mrh.doHadoopStep 在这里又抛出了这样的错

15/07/12 05:28:21 Exception in thread "main" java.io.IOException: Error opening job jar: /home/sangzhe/hadoop/contrib/streaming/hadoop-streaming.jar
    at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: /home/sangzhe/hadoop/contrib/streaming/hadoop-streaming.jar (没有那个文件或目录)
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:220)
    at java.util.zip.ZipFile.<init>(ZipFile.java:150)
    at java.util.jar.JarFile.<init>(JarFile.java:166)
    at java.util.jar.JarFile.<init>(JarFile.java:103)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

再试input.xml依然不可以

baimingze commented 9 years ago

/home/sangzhe/hadoop/contrib/streaming/hadoop-streaming.jar 是在找这个文件？

On Sun, 12 Jul 2015 16:24 SangZhe notifications@github.com wrote:

再次用tandem_params.xml运行，又报了不同的错，修正了两个路径以后又往下执行到了683行mrh.doHadoopStep 在这里又抛出了这样的错

15/07/12 05:28:21 Exception in thread "main" java.io.IOException: Error opening job jar: /home/sangzhe/hadoop/contrib/streaming/hadoop-streaming.jar at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.io.FileNotFoundException: /home/sangzhe/hadoop/contrib/streaming/hadoop-streaming.jar (没有那个文件或目录) at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:220) at java.util.zip.ZipFile.(ZipFile.java:150) at java.util.jar.JarFile.(JarFile.java:166) at java.util.jar.JarFile.(JarFile.java:103) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) ``

— Reply to this email directly or view it on GitHub https://github.com/baimingze/project1/issues/4#issuecomment-120730252.

sangzhe commented 9 years ago

对，那个我已经改好了。map，reduce一次后紧接着又有了问题

[sangzhe@localhost mr-tandem]$ python mr-tandem.py general_config.json tandem_params.xml

15/07/12 07:07:15 scripts, config and logs will be written to tandem_params.xml_runs/20150712070708 
15/07/12 07:07:15 begin execution 
15/07/12 07:07:15 running Hadoop step tandem_params.xml-condition 
15/07/12 15:07:17 INFO streaming.StreamJob: Running job: job_201507121118_0003 
15/07/12 15:07:18 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/07/12 15:07:25 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/07/12 15:07:37 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/07/12 15:07:40 INFO streaming.StreamJob: Job complete: job_201507121118_0003 
15/07/12 07:07:40 running Hadoop step tandem_params.xml-process-final 
15/07/12 07:07:41 problem running command: 
15/07/12 07:07:41 /home/sangzhe/hadoop/bin/hadoop jar /home/sangzhe/hadoop/contrib/streaming/hadoop-streaming.jar -jobconf mapred.task.timeout=360000000 -jobconf mapred.reduce.tasks=1 -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks.speculative.execution=false -jobconf mapred.map.tasks.speculative.execution=false -mapper tandem_20150710000349 -mapper2_1 hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708 __sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml -reducer tandem_20150710000349 -reducer2_1.1 hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708 __sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml  -reportURL hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/output -input hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/output1.1/ -output hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/results -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp//home/sangzhe/github/project1/mrtandam-ica-code/mr-tandem/test_spectra.mgf.gz#__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__test_spectra.mgf.gz -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp//home/sangzhe/github/project1/mrtandem_fasta/crap.fasta.pro#__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandem_fasta__sl__crap.fasta.pro -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp//home/sangzhe/github/project1/mrtandem_fasta/scd_1.fasta.pro#__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandem_fasta__sl__scd_1.fasta.pro -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp//home/sangzhe/github/project1/mrtandem_bin/tandem_20150710000349#tandem_20150710000349 -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml#__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__taxonomy.xml#__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__taxonomy.xml -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__isb_default_input_kscore.xml#__sl__home__sl__sangzhe__sl__github__sl__project1__sl__mrtandam-ica-code__sl__mr-tandem__sl__isb_default_input_kscore.xml -cacheFile hdfs:///home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/reducer1_1#reducer1_1 
15/07/12 15:07:40 WARN streaming.StreamJob: -cacheFile option is deprecated, please use -files instead.
15/07/12 15:07:40 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/home/sangzhe/hadoop/hadoop-tmp/hadoop-unjar3026221571832876555/] [] /tmp/streamjob2390578920602291803.jar tmpDir=null
15/07/12 15:07:41 ERROR streaming.StreamJob: Error launching job , bad input path : File does not exist: /home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/reducer1_1
Streaming Command Failed! 
15/07/12 07:07:41 return code 2 
15/07/12 07:07:42 problem running command: 
15/07/12 07:07:42 /home/sangzhe/hadoop/bin/hadoop fs -copyToLocal /home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/results/part-00000 /tmp/tmpx9qluV 
15/07/12 07:07:42 copyToLocal: null 
15/07/12 07:07:42 return code 255 
15/07/12 07:07:53 problem running command: 
15/07/12 07:07:53 /home/sangzhe/hadoop/bin/hadoop fs -copyToLocal /home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/results/part-00000 /tmp/tmpvAvzE4 
15/07/12 07:07:53 copyToLocal: null 
15/07/12 07:07:53 return code 255 
15/07/12 07:08:03 problem running command: 
15/07/12 07:08:03 /home/sangzhe/hadoop/bin/hadoop fs -copyToLocal /home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/results/part-00000 /tmp/tmpr8ugNz 
15/07/12 07:08:03 copyToLocal: null 
15/07/12 07:08:03 return code 255 
15/07/12 07:08:14 problem running command: 
15/07/12 07:08:14 /home/sangzhe/hadoop/bin/hadoop fs -copyToLocal /home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/results/part-00000 /tmp/tmpYyxz37 
15/07/12 07:08:14 copyToLocal: null 
15/07/12 07:08:14 return code 255 
15/07/12 07:08:25 problem running command: 
15/07/12 07:08:25 /home/sangzhe/hadoop/bin/hadoop fs -copyToLocal /home/sangzhe/hadoop/hadoop-tmp/tandem_params.xml_runs/20150712070708/results/part-00000 /tmp/tmpwh0302 
15/07/12 07:08:25 copyToLocal: null 
15/07/12 07:08:25 return code 255 
15/07/12 07:08:25 [Errno 2] No such file or directory: '/tmp/tmpwh0302' 
15/07/12 07:08:25 elapsed time = 0:01:17.030694

sangzhe commented 9 years ago

这个应该算是运行成功了吧

[sangzhe@localhost mr-tandem]$ python mr-tandem.py general_config.json tandem_params.xml

15/07/13 02:10:51 uploading /home/sangzhe/github/project1/mrtandem_bin/tandem to /home/sangzhe/hadoop//home/sangzhe/github/project1/mrtandem_bin/tandem_20150710000349 
15/07/13 02:10:53 uploading /home/sangzhe/github/project1/mrtandem_fasta/scd_1.fasta.pro to /home/sangzhe/hadoop//home/sangzhe/github/project1/mrtandem_fasta/scd_1.fasta.pro 
15/07/13 02:10:54 uploading /home/sangzhe/github/project1/mrtandem_fasta/crap.fasta.pro to /home/sangzhe/hadoop//home/sangzhe/github/project1/mrtandem_fasta/crap.fasta.pro 
15/07/13 02:10:55 uploading /home/sangzhe/github/project1/mrtandam-ica-code/mr-tandem/test_spectra.mgf.gz to /home/sangzhe/hadoop//home/sangzhe/github/project1/mrtandam-ica-code/mr-tandem/test_spectra.mgf.gz 
15/07/13 02:10:58 scripts, config and logs will be written to tandem_params.xml_runs/20150713021047 
15/07/13 02:10:58 begin execution 
15/07/13 02:10:58 running Hadoop step tandem_params.xml-condition 
15/07/13 10:10:59 INFO streaming.StreamJob: Running job: job_201507130633_0013 
15/07/13 10:11:00 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/07/13 10:11:08 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/07/13 10:11:20 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/07/13 10:11:23 INFO streaming.StreamJob: Job complete: job_201507130633_0013 
15/07/13 02:11:23 running Hadoop step tandem_params.xml-process-final 
15/07/13 10:11:24 INFO streaming.StreamJob: Running job: job_201507130633_0014 
15/07/13 10:11:25 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/07/13 10:11:32 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/07/13 10:11:44 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/07/13 10:11:47 INFO streaming.StreamJob: Job complete: job_201507130633_0014 
15/07/13 02:11:51 Done.  X!Tandem logs: 
15/07/13 02:11:14 reducer1_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/07/13 02:11:14 reducer1_1: read lines from mapper_1 to estabish mapper count
15/07/13 02:11:14 reducer1_1: 1 mappers reporting in with 1 unique hostnames    
15/07/13 02:11:14 reducer1_1: Loading spectra .15/07/13 02:11:14 reducer1_1: loaded.    
15/07/13 02:11:14 reducer1_1: Spectra matching criteria = 653   
15/07/13 02:11:14 reducer1_1: Pluggable scoring enabled.    
15/07/13 02:11:14 reducer1_1: Starting threads .15/07/13 02:11:14 reducer1_1: send 1 processes to mapper_2  
15/07/13 02:11:14 reducer1_1: output a process with 653 spectra for task 00001  
15/07/13 02:11:29 mapper2_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/07/13 02:11:29 mapper2_1: loading processes...15/07/13 02:11:29 mapper2_1.00001: loaded process with spectra count = 653 
done.   
15/07/13 02:11:29 mapper2_1.00001: Computing models on 653 spectra: 
    1215/07/13 02:11:30 mapper2_1.00001: output a process with 653 spectra for task 00001
15/07/13 02:11:38 reducer2_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/07/13 02:11:38 reducer2_1: loading processes...15/07/13 02:11:38 reducer2_1: loaded process with spectra count = 653 
done.   
15/07/13 02:11:38 reducer2_1: Creating report:  
    initial calculations  ..... done.
    sorting  ..... done.
    finding repeats ..... done.
    evaluating results ..... done.
    calculating expectations ..... done.
    writing results writing output to hadoop task directory as /home/sangzhe/hadoop/hadoop-tmp/mapred/local/taskTracker/jobcache/job_201507130633_0014/work/output
..... done. 
15/07/13 02:11:38 reducer2_1:   
Valid models = 9 
15/07/13 02:11:51 (this log is also written to tandem_params.xml_runs/20150713021047/results.txt) 
15/07/13 02:11:51 Results written to output 
15/07/13 02:11:51 elapsed time = 0:01:03.377650 
[sangzhe@localhost mr-tandem]$

还是有点莫名其妙的，不知道之前是哪里出了问题。要么在copyToHDFS卡住，要么在copyToLocal卡住。或者是第一次执行显示unable to invoke local copy of hadoop for file transfer purposes，再执行一次却又可以再往下运行了。这次是是把hadoop_dir由之前的/home/sangzhe/hadoop/hadoop-tmp改为了/home/sangzhe/hadoop，很顺利的一次通过，所以还是HDFS目录路径没写好？

baimingze commented 9 years ago

great! could you create a branch in GitHub and commit&push it?

sangzhe commented 9 years ago

还不行，今天下午再重复，重复不出来，我再试试

sangzhe commented 9 years ago

刚才又成功了一次，我觉得是不是程序不稳定。因为同样的配置，前几次会有不同的点导致失败，而再运行，恰巧就成功了。请问老师的有这种情况吗？

baimingze commented 9 years ago

我现在在单位，没法测试，等我回去测试一下

On Mon, 13 Jul 2015 12:59 SangZhe notifications@github.com wrote:

刚才又成功了一次，我觉得是不是程序不稳定。因为同样的配置，前几次会有不同的点导致失败，而再运行，恰巧就成功了。请问老师的有这种情况吗？

— Reply to this email directly or view it on GitHub https://github.com/baimingze/project1/issues/4#issuecomment-120902621.

baimingze commented 9 years ago

试试能不能访问这个地址

http://localhost.localdomain:50070/dfshealth.jsp

这个是检查name node的。

另外一个端口号：

http://localhost.localdomain:50030/jobtracker.jsp

这个是检查jobtracker的

我这里出现了别的错误：

15/07/13 19:53:14 global name 'lastlogtime' is not defined 
15/07/13 19:53:14 !!! 
15/07/13 19:53:14 Hadoop cluster access doesn't seem to be set up! 
15/07/13 19:53:14 !!! 
15/07/13 19:53:14 if you haven't already initiated a proxy to your hadoop gateway, open another shell and leave the following command running in it: 
15/07/13 19:53:14 "ssh -D 6789 -n -N localhost.localdomain@10.211.55.1" 
15/07/13 19:53:14 see https://univsupport.hipods.ihost.com/documents/7/ for details on hadoop gateway proxies 
15/07/13 19:53:14 and you'll need to install hadoop on your local machine, to get the hadoop file transfer commands working. 
15/07/13 19:53:14 see http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html and http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/

sangzhe commented 9 years ago

两个地址我都可以打开。那个错误应该可以这样解决：按照general_config.json内hadoop_dir的地址，手动在HDFS创建目录，再运行应该就没问题了

baimingze commented 9 years ago

@sangzhe 你在general_config里面添加了两行代码

"hadoop_gateway":"10.211.55.1",
 "hadoop_user":"localhost.localdomain"

麻烦解释以下它们的由来以及相关环境设置方法？

我遇到这样的错误：

15/07/13 19:53:14 global name 'lastlogtime' is not defined 
15/07/13 19:53:14 !!! 
15/07/13 19:53:14 Hadoop cluster access doesn't seem to be set up! 
15/07/13 19:53:14 !!! 
15/07/13 19:53:14 if you haven't already initiated a proxy to your hadoop gateway, open another shell and leave the following command running in it: 
15/07/13 19:53:14 "ssh -D 6789 -n -N localhost.localdomain@10.211.55.1" 
15/07/13 19:53:14 see https://univsupport.hipods.ihost.com/documents/7/ for details on hadoop gateway proxies 
15/07/13 19:53:14 and you'll need to install hadoop on your local machine, to get the hadoop file transfer commands working. 
15/07/13 19:53:14 see http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html and http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/

sangzhe commented 9 years ago

在dev分支我又去掉这两行代码了。这两行代码我是在看MR-Tandem_UserManual.pdf添加上的，并没有什么功能，只是在程序出现问题调用explain_hadoop时打印出来提醒建立hadoop集的代理链接。因为在UserManual上，它说要先

ssh -D <proxy_port> -n -N <your_username>@<your_cluster_URL>

通过ssh传递文件。

<your_username>填的localhost.localdomain，是在shell中输入hostname得到的

[sangzhe@localhost ~]$ hostname
localhost.localdomain

<your_cluseter_URL>填的10.211.55.1，是在shell中输入route -n得到的

[sangzhe@localhost ~]$ route -n
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.211.55.1     0.0.0.0         UG    100    0        0 eth0
10.211.55.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.211.55.0     0.0.0.0         255.255.255.0   U     100    0        0 eth0

在运行mr-tandem.py先ssh是可以成功的，但后来我运行成功并未进行此操作。

sangzhe commented 9 years ago

老师遇到的那个错误我觉得就是应该general_config.json中hadoop_dir所指的地址并不存在导致的。mapreduce_helper.py中有创建远程目录的函数，但实际上并没有被使用。

def createRemoteDir(dirname) :
    if (runHadoop()) :
        args = [getHadoopBinary(),"fs","-mkdir",dirname]
        runPipeCommand( args )

所以老师要检查一下/tmp/hadoop-mingze/dfs/name是否存在

hadoop fs -ls /tmp/hadoop-mingze/dfs/name

baimingze commented 9 years ago

[mingze@localhost mr-tandem]$ python mr-tandem.py general_config.json tandem_params.xml

15/07/20 21:41:00 Unexpected error opening X!Tandem parameters file tandem_params.xml: [Errno 2] No such file or directory: 'tandem_params.xml' 
15/07/20 21:41:00 quitting

@sangzhe 请问你遇到过这个问题么？

sangzhe commented 9 years ago

请将下列文件从mrtandem_bin中复制到mr-tandem.py所在目录下

tandem_params.xml
isb_default_input_kscore.xml
taxonomy.xml
test_spectra.mgf
output

taxonomy.xml中两个fasta文件路径也要修改正确

baimingze commented 9 years ago

两个地址我都可以打开。那个错误应该可以这样解决：按照general_config.json内hadoop_dir的地址，手动在HDFS创建目录，再运行应该就没问题了

我现在也卡在你之前出错的那里，请把上述具体步骤发上来一下呢。

sangzhe commented 9 years ago

@baimingze 我看你的general_config.json中为

 "hadoop_dir": "/tmp/hadoop-mingze/dfs/name", # your HDFS directory on the cluster

就在shell中先

hadoop fs -mkdir /tmp/hadoop-mingze/dfs/name

baimingze commented 9 years ago

我这样作了。这个目录也确实是存在的：

[mingze@localhost hadoop-0.20.1+152]$ bin/hadoop fs -ls  /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/
Found 6 items
-rw-r--r--   1 mingze supergroup       9620 2015-07-21 22:38 /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__isb_default_input_kscore.xml
-rw-r--r--   1 mingze supergroup       5256 2015-07-21 22:38 /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml
-rw-r--r--   1 mingze supergroup        349 2015-07-21 22:38 /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__taxonomy.xml
-rw-r--r--   1 mingze supergroup         47 2015-07-21 22:38 /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/mapper1-input-values
drwxr-xr-x   - mingze supergroup          0 2015-07-21 22:39 /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/output1.1
-rw-r--r--   1 mingze supergroup        966 2015-07-21 22:38 /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/tandem_params.xml.cfg.json

错误依旧

15/07/21 21:38:50 scripts, config and logs will be written to tandem_params.xml_runs/20150721213842 
15/07/21 21:38:50 begin execution 
15/07/21 21:38:50 running Hadoop step tandem_params.xml-condition 
15/07/21 22:38:51 INFO streaming.StreamJob: Running job: job_201507212235_0003 
15/07/21 22:38:52 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/07/21 22:38:59 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/07/21 22:39:11 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/07/21 22:39:14 INFO streaming.StreamJob: Job complete: job_201507212235_0003 
15/07/21 21:39:14 running Hadoop step tandem_params.xml-process-final 
15/07/21 21:39:15 problem running command: 
15/07/21 21:39:15 /opt/hadoop-0.20.1+152/bin/hadoop jar /opt/hadoop-0.20.1+152/contrib/streaming/hadoop-0.20.1+152-streaming.jar -jobconf mapred.task.timeout=360000000 -jobconf mapred.reduce.tasks=1 -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks.speculative.execution=false -jobconf mapred.map.tasks.speculative.execution=false -mapper tandem_20150720211136 -mapper2_1 hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842 __sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml -reducer tandem_20150720211136 -reducer2_1.1 hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842 __sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml  -reportURL hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__output -input hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/output1.1/ -output hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/results -cacheFile hdfs:///tmp/hadoop-run//home/mingze/work/sangzhe-dev/mrtandam-ica-code/mr-tandem/test_spectra.mgf.gz#__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__test_spectra.mgf.gz -cacheFile hdfs:///tmp/hadoop-run//home/mingze/work/sangzhe-dev/mrtandem_fasta/crap.fasta.pro#__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandem_fasta__sl__crap.fasta.pro -cacheFile hdfs:///tmp/hadoop-run//home/mingze/work/sangzhe-dev/mrtandem_fasta/scd_1.fasta.pro#__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandem_fasta__sl__scd_1.fasta.pro -cacheFile hdfs:///tmp/hadoop-run//home/mingze/work/sangzhe-dev/mrtandem_bin/tandem_20150720211136#tandem_20150720211136 -cacheFile hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml#__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__tandem_params.xml -cacheFile hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__taxonomy.xml#__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__taxonomy.xml -cacheFile hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__isb_default_input_kscore.xml#__sl__home__sl__mingze__sl__work__sl__sangzhe-dev__sl__mrtandam-ica-code__sl__mr-tandem__sl__isb_default_input_kscore.xml -cacheFile hdfs:///tmp/hadoop-run/tandem_params.xml_runs/20150721213842/reducer1_1#reducer1_1 
15/07/21 22:39:14 WARN streaming.StreamJob: -cacheFile option is deprecated, please use -files instead.
15/07/21 22:39:14 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/tmp/hadoop-mingze/hadoop-unjar3765076056128810791/] [] /tmp/streamjob8539250200240596486.jar tmpDir=null
15/07/21 22:39:15 ERROR streaming.StreamJob: Error launching job , bad input path : File does not exist: /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/reducer1_1
Streaming Command Failed! 
15/07/21 21:39:15 return code 2 
15/07/21 21:39:15 problem running command: 
15/07/21 21:39:15 /opt/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/results/part-00000 /tmp/tmpEZIcEo 
15/07/21 21:39:15 copyToLocal: null 
15/07/21 21:39:15 return code 255 
15/07/21 21:39:26 problem running command: 
15/07/21 21:39:26 /opt/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/results/part-00000 /tmp/tmp1_CdK6 
15/07/21 21:39:26 copyToLocal: null 
15/07/21 21:39:26 return code 255 
15/07/21 21:39:37 problem running command: 
15/07/21 21:39:37 /opt/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/results/part-00000 /tmp/tmpsU0MKv 
15/07/21 21:39:37 copyToLocal: null 
15/07/21 21:39:37 return code 255 
15/07/21 21:39:47 problem running command: 
15/07/21 21:39:47 /opt/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/results/part-00000 /tmp/tmpKSFk3J 
15/07/21 21:39:47 copyToLocal: null 
15/07/21 21:39:47 return code 255 
15/07/21 21:39:58 problem running command: 
15/07/21 21:39:58 /opt/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /tmp/hadoop-run/tandem_params.xml_runs/20150721213842/results/part-00000 /tmp/tmp3XUjKm 
15/07/21 21:39:58 copyToLocal: null 
15/07/21 21:39:58 return code 255 
15/07/21 21:39:58 [Errno 2] No such file or directory: '/tmp/tmp3XUjKm' 
15/07/21 21:39:58 elapsed time = 0:01:15.927155

从错误中看出，第一步map/reduce是执行成功了的; 出错的是第二步，那么一长串的命令里似乎有问题。

sangzhe commented 9 years ago

这个问题我也搞不清楚是怎么回事，多试几次会有成功的

sangzhe commented 9 years ago

之前没注意到

15/07/21 22:39:14 WARN streaming.StreamJob: -cacheFile option is deprecated, please use -files instead.
15/07/21 22:39:14 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/tmp/hadoop-mingze/hadoop-unjar3765076056128810791/] [] /tmp/streamjob8539250200240596486.jar tmpDir=null

我查看了boto包的源代码以及hadoop源代码。 boto里面所提供的boto.emr.StreamingStep，是用来拼出完整的streaming命令。然而不论是我们使用的2.0rc还是最新的2.38.0版本，它所使用的参数-cacheFile,-cacheArchive是旧版的hadoop所使用的。我下载了hadoop-0.10.1，查看hadoop-streaming的源代码，看到这里使用的参数是-cacheFile和 -jobconf。但我们使用的hadoop-0.20.1+152，参数已经有了改变，-cacheFile被改为了-files，-jobconf改为了-D 所以我想程序不稳定会出错可能和这个也有关系，可能是兼容的不是特别好，但多尝试几次是能成功运行下来的

还有如果出现这个问题，也是没什么办法，只能多运行几次

15/07/12 04:58:28 unable to invoke local copy of hadoop for file transfer purposes 
15/07/12 04:58:28 global name 'lastlogtime' is not defined 
15/07/12 04:58:28 !!! 
15/07/12 04:58:28 Hadoop cluster access doesn't seem to be set up! 
15/07/12 04:58:28 !!! 
15/07/12 04:58:28 if you haven't already initiated a proxy to your hadoop gateway, open another shell and leave the following command running in it: 
15/07/12 04:58:28 "ssh -D 6789 -n -N <your_hadoop_username>@<your_hadoop_gateway>" 
15/07/12 04:58:28 see https://univsupport.hipods.ihost.com/documents/7/ for details on hadoop gateway proxies 
15/07/12 04:58:28 and you'll need to install hadoop on your local machine, to get the hadoop file transfer commands working. 
15/07/12 04:58:28 see http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html and http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/

baimingze commented 9 years ago

抱歉我这里试了很久，还是相同的错误。

我想到一个替代方案：在我们学校的一个局域网机器上进行测试，这样我们俩都能看到。麻烦你有时间的时候在这个服务器上配置一下mrtandem

服务器地址：172.16.98.3 root密码：***

hadoop启动步骤：

|- 使用账号root ssh 登录到server-200(98.3)

|- cd到/usr/hadoop/hadoop-0.20.2目录下执行 ./bin/start-all.sh

|- shell环境下键入jps命令返回以下任务则启动完成

-jps

-JobTracker

-NameNode

-DataNode

-SecondaryNameNode

Hadoop也可以使用自带测试用例测试是否完全启动

|- cd到/home/ouyc/hadoop-1.2.1目录下执行

    Hadoop fs –put README.txt /

   |- cd到/home/ouyc/hadoop-1.2.1目录下执行

hadoop jar hadoop-examples-1.2.1.jar wordcount /README.txt /ouputdir

On Wed, 22 Jul 2015 at 00:27 SangZhe notifications@github.com wrote:

之前没注意到

15/07/21 22:39:14 WARN streaming.StreamJob: -cacheFile option is deprecated, please use -files instead. 15/07/21 22:39:14 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. packageJobJar: [/tmp/hadoop-mingze/hadoop-unjar3765076056128810791/] [] /tmp/streamjob8539250200240596486.jar tmpDir=null

我查看了boto包的源代码以及hadoop源代码。

boto里面所提供的boto.emr.StreamingStep，是用来拼出完整的streaming命令。然而不论是我们使用的2.0rc还是最新的2.38.0版本，它所使用的参数-cacheFile,-cacheArchive是旧版的hadoop所使用的。我下载了hadoop-0.10.1，查看hadoop-streaming的源代码，看到这里使用的参数是-cacheFile和 -jobconf。但我们使用的hadoop-0.20.1+152，参数已经有了改变，-cacheFile被改为了-files，-jobconf改为了-D 所以我想程序不稳定会出错可能和这个也有关系，但多尝试几次是能成功运行下来的

— Reply to this email directly or view it on GitHub https://github.com/baimingze/project1/issues/4#issuecomment-123506898.

sangzhe commented 9 years ago

麻烦老师提供一个从校外接入学校服务器的方法。今天刚回家，要呆上一周。

baimingze commented 9 years ago

15/07/29 22:23:48 set core hadoop_home=/home/mingze/hadoop/hadoop-0.20.1+152 15/07/29 22:23:48 set core xtandemParametersLocalPath=tandem_params.xml 15/07/29 22:23:48 set core verbose=True 15/07/29 22:23:48 set core runLocal=False 15/07/29 22:23:48 set core baseName=tandem_params.xml 15/07/29 22:23:48 set core coreBaseName=tandem_params.xml 15/07/29 22:23:48 set core jobTimeStamp=20150729222348 15/07/29 22:23:48 debug mode on 15/07/29 22:23:48 set core jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:48 set core resultsFilename=tandem_params.xml_runs/20150729222348/results.txt 15/07/29 22:23:48 set core sharedDir= 15/07/29 22:23:48 set core verbose=True 15/07/29 22:23:48 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop dfs -test -d hdfs:///home/mingze/hadoop 15/07/29 22:23:48 return code 0 15/07/29 22:23:48 set core jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:48 set core verbose=True 15/07/29 22:23:48 set 0 baseName=tandem_params.xml 15/07/29 22:23:48 set 0 eca_cfgName=tandem_params.xml 15/07/29 22:23:48 set 0 xtandemParametersLocalPath=tandem_params.xml 15/07/29 22:23:48 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:48 set 0 refineSetting= 15/07/29 22:23:48 set 0 mainXtandemParametersName=slhomeslmingzeslmr-tandemsltandem_params.xml 15/07/29 22:23:48 set core verbose=True 15/07/29 22:23:48 set 0 sharedFile_spectrum1=/home/mingze/mr-tandem/test_spectra.mgf.gz 15/07/29 22:23:48 set 0 outputName=slhomeslmingzeslmr-tandemsloutput 15/07/29 22:23:48 set 0 outputLocalPath=/home/mingze/mr-tandem/output 15/07/29 22:23:48 set 0 refineSetting=no 15/07/29 22:23:48 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -put /tmp/tmp0PKHiB /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsltandem_params.xml 15/07/29 22:23:49 return code 0 15/07/29 22:23:49 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -put /tmp/tmpm8EkSa /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemslisb_default_input_kscore.xml 15/07/29 22:23:50 return code 0 15/07/29 22:23:50 set 0 sharedFile_database2=/home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro 15/07/29 22:23:50 set 0 sharedFile_database3=/home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro 15/07/29 22:23:50 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -put /tmp/tmplph25b /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsltaxonomy.xml 15/07/29 22:23:51 return code 0 15/07/29 22:23:51 set 0 configNameJSON=tandem_params.xml.cfg.json 15/07/29 22:23:51 set 0 tandem_file=/home/mingze/work/project1/mrtandem_bin/tandem 15/07/29 22:23:51 checking for existing remote copies of data and script files 15/07/29 22:23:51 checking for file /home/mingze/work/project1/mrtandem_bin/tandem_20150709191533 on remote system ... 15/07/29 22:23:51 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop dfs -ls /home/mingze/hadoop/home/mingze/work/project1/mrtandem_bin/tandem_20150709191533 2> /dev/null 15/07/29 22:23:52 ok -- file found on target system 15/07/29 22:23:52 existing HDFS copy of file /home/mingze/work/project1/mrtandem_bin/tandem verified with correct size, good 15/07/29 22:23:52 set 0 tandem_file=/home/mingze/work/project1/mrtandem_bin/tandem_20150709191533 15/07/29 22:23:52 set core oldSkool=False 15/07/29 22:23:52 checking for existing remote copies of data and script files 15/07/29 22:23:52 checking for file /home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro on remote system ... 15/07/29 22:23:52 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop dfs -ls /home/mingze/hadoop/home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro 2> /dev/null 15/07/29 22:23:52 ok -- file found on target system 15/07/29 22:23:52 existing HDFS copy of file /home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro verified with correct size, good 15/07/29 22:23:52 set 0 sharedFile_database3=/home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro 15/07/29 22:23:52 checking for existing remote copies of data and script files 15/07/29 22:23:52 checking for file /home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro on remote system ... 15/07/29 22:23:52 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop dfs -ls /home/mingze/hadoop/home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro 2> /dev/null 15/07/29 22:23:53 ok -- file found on target system 15/07/29 22:23:53 existing HDFS copy of file /home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro verified with correct size, good 15/07/29 22:23:53 set 0 sharedFile_database2=/home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro 15/07/29 22:23:53 checking for existing remote copies of data and script files 15/07/29 22:23:53 checking for file /home/mingze/mr-tandem/test_spectra.mgf.gz on remote system ... 15/07/29 22:23:53 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop dfs -ls /home/mingze/hadoop/home/mingze/mr-tandem/test_spectra.mgf.gz 2> /dev/null 15/07/29 22:23:54 ok -- file found on target system 15/07/29 22:23:54 existing HDFS copy of file /home/mingze/mr-tandem/test_spectra.mgf.gz verified with correct size, good 15/07/29 22:23:54 set 0 sharedFile_spectrum1=/home/mingze/mr-tandem/test_spectra.mgf.gz 15/07/29 22:23:54 set core aws_access_key_id=xxxx 15/07/29 22:23:54 set core aws_secret_access_key=xxxx 15/07/29 22:23:54 set core RSAKey=xxxx 15/07/29 22:23:54 set core RSAKeyName=xxxx 15/07/29 22:23:54 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -put /tmp/tmp274FSE /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/tandem_params.xml.cfg.json 15/07/29 22:23:55 return code 0 15/07/29 22:23:55 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -put /tmp/tmpO1tcWr /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/mapper1-input-values 15/07/29 22:23:56 return code 0 15/07/29 22:23:56 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set core verbose=True 15/07/29 22:23:56 scripts, config and logs will be written to tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set 0 finalReportURL=hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsloutput 15/07/29 22:23:56 set core oldSkool=False 15/07/29 22:23:56 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set 0 jobDir=tandem_params.xml_runs/20150729222348 15/07/29 22:23:56 set 0 finalOutputDir=hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results 15/07/29 22:23:56 set 0 resultsDir=tandem_params.xml_runs/20150729222348/results 15/07/29 22:23:56 set 0 completed=False 15/07/29 22:23:56 begin execution 15/07/29 22:23:56 running Hadoop step tandem_params.xml-condition 15/07/29 22:23:56 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop jar /home/mingze/hadoop/hadoop-0.20.1+152/contrib/streaming/hadoop-0.20.1+152-streaming.jar -jobconf mapred.task.timeout=360000000 -jobconf mapred.reduce.tasks=1 -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks.speculative.execution=false -jobconf mapred.map.tasks.speculative.execution=false -mapper tandem_20150709191533 -mapper1_1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 slhomeslmingzeslmr-tandemsltandem_params.xml -reducer tandem_20150709191533 -reducer1_1.1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 slhomeslmingzeslmr-tandemsltandem_params.xml -input hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/mapper1-input-values -output hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/output1.1/ -cacheFile hdfs:///home/mingze/hadoop//home/mingze/mr-tandem/test_spectra.mgf.gz#slhomeslmingzeslmr-tandemsltest_spectra.mgf.gz -cacheFile hdfs:///home/mingze/hadoop//home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro#slhomeslmingzeslworkslproject1slmrtandem_fastaslcrap.fasta.pro -cacheFile hdfs:///home/mingze/hadoop//home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro#slhomeslmingzeslworkslproject1slmrtandem_fastaslscd_1.fasta.pro -cacheFile hdfs:///home/mingze/hadoop//home/mingze/work/project1/mrtandem_bin/tandem_20150709191533#tandem_20150709191533 -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsltandem_params.xml#slhomeslmingzeslmr-tandemsltandem_params.xml -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsltaxonomy.xml#slhomeslmingzeslmr-tandemsltaxonomy.xml -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemslisb_default_input_kscore.xml#slhomeslmingzeslmr-tandemslisb_default_input_kscore.xml 15/07/29 23:23:56 WARN streaming.StreamJob: -cacheFile option is deprecated, please use -files instead. 15/07/29 23:23:56 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. 15/07/29 22:23:56 packageJobJar: [/home/mingze/hadoop/hadoop-tmp/hadoop-unjar3112717406557694130/] [] /tmp/streamjob4821664193025231148.jar tmpDir=null 15/07/29 23:23:57 INFO mapred.FileInputFormat: Total input paths to process : 1 15/07/29 23:23:57 INFO streaming.StreamJob: getLocalDirs(): [/home/mingze/hadoop/hadoop-tmp/mapred/local] 15/07/29 23:23:57 INFO streaming.StreamJob: Running job: job_201507292312_0007 15/07/29 23:23:57 INFO streaming.StreamJob: To kill this job, run: 15/07/29 23:23:57 INFO streaming.StreamJob: /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop job -Dmapred.job.tracker=hdfs://localhost:8021 -kill job_201507292312_0007 15/07/29 23:23:57 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201507292312_0007 15/07/29 23:23:58 INFO streaming.StreamJob: map 0% reduce 0% 15/07/29 23:24:05 INFO streaming.StreamJob: map 100% reduce 0% 15/07/29 23:24:17 INFO streaming.StreamJob: map 100% reduce 100% 15/07/29 23:24:20 INFO streaming.StreamJob: Job complete: job_201507292312_0007 15/07/29 23:24:20 INFO streaming.StreamJob: Output: hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/output1.1/ 15/07/29 22:24:20 return code 0 15/07/29 22:24:20 running Hadoop step tandem_params.xml-process-final 15/07/29 22:24:20 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop jar /home/mingze/hadoop/hadoop-0.20.1+152/contrib/streaming/hadoop-0.20.1+152-streaming.jar -jobconf mapred.task.timeout=360000000 -jobconf mapred.reduce.tasks=1 -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks.speculative.execution=false -jobconf mapred.map.tasks.speculative.execution=false -mapper tandem_20150709191533 -mapper2_1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 slhomeslmingzeslmr-tandemsltandem_params.xml -reducer tandem_20150709191533 -reducer2_1.1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 slhomeslmingzeslmr-tandemsltandem_params.xml -reportURL hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsloutput -input hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/output1.1/ -output hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results -cacheFile hdfs:///home/mingze/hadoop//home/mingze/mr-tandem/test_spectra.mgf.gz#slhomeslmingzeslmr-tandemsltest_spectra.mgf.gz -cacheFile hdfs:///home/mingze/hadoop//home/mingze/work/project1/mrtandem_fasta/crap.fasta.pro#slhomeslmingzeslworkslproject1slmrtandem_fastaslcrap.fasta.pro -cacheFile hdfs:///home/mingze/hadoop//home/mingze/work/project1/mrtandem_fasta/scd_1.fasta.pro#slhomeslmingzeslworkslproject1slmrtandem_fastaslscd_1.fasta.pro -cacheFile hdfs:///home/mingze/hadoop//home/mingze/work/project1/mrtandem_bin/tandem_20150709191533#tandem_20150709191533 -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsltandem_params.xml#slhomeslmingzeslmr-tandemsltandem_params.xml -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemsltaxonomy.xml#slhomeslmingzeslmr-tandemsltaxonomy.xml -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/slhomeslmingzeslmr-tandemslisb_default_input_kscore.xml#slhomeslmingzeslmr-tandemslisb_default_input_kscore.xml -cacheFile hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348/reducer1_1#reducer1_1 15/07/29 23:24:21 WARN streaming.StreamJob: -cacheFile option is deprecated, please use -files instead. 15/07/29 23:24:21 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. 15/07/29 22:24:21 packageJobJar: [/home/mingze/hadoop/hadoop-tmp/hadoop-unjar2315450143347484676/] [] /tmp/streamjob5361580109200752306.jar tmpDir=null 15/07/29 23:24:21 ERROR streaming.StreamJob: Error launching job , bad input path : File does not exist: /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/reducer1_1 15/07/29 22:24:21 Streaming Command Failed! 15/07/29 22:24:21 problem running command: 15/07/29 22:24:21 set core verbose=True 15/07/29 22:24:21 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results/part-00000 /tmp/tmpiO2xmL 15/07/29 22:24:22 copyToLocal: null 15/07/29 22:24:22 problem running command: 15/07/29 22:24:32 set core verbose=True 15/07/29 22:24:32 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results/part-00000 /tmp/tmpFNkFEM 15/07/29 22:24:33 copyToLocal: null 15/07/29 22:24:33 problem running command: 15/07/29 22:24:43 set core verbose=True 15/07/29 22:24:43 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results/part-00000 /tmp/tmpyEuPok 15/07/29 22:24:44 copyToLocal: null 15/07/29 22:24:44 problem running command: 15/07/29 22:24:54 set core verbose=True 15/07/29 22:24:54 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results/part-00000 /tmp/tmpiea0xi 15/07/29 22:24:54 copyToLocal: null 15/07/29 22:24:54 problem running command: 15/07/29 22:25:04 set core verbose=True 15/07/29 22:25:04 /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -copyToLocal /home/mingze/hadoop/tandem_params.xml_runs/20150729222348/results/part-00000 /tmp/tmpHHZdjc 15/07/29 22:25:05 copyToLocal: null 15/07/29 22:25:05 problem running command: 15/07/29 22:25:05 [Errno 2] No such file or directory: '/tmp/tmpHHZdjc' 15/07/29 22:25:05 elapsed time = 0:01:17.537822

sangzhe commented 9 years ago

既然可以执行完成第一步mapreduce的话，程序应该是没问题了，多运行就可以了。我在运行软件时也会出现在第二步mapreduce或者在执行完两步mapreduce后，从HDFS上取回结果文件的时候出错。

baimingze commented 9 years ago

@sangzhe hadoop streaming运行参数里有下面这些选项，我在hadoop文档里没找到对应的解释，你知道他们的意思或者有什么线索没？

 -mapper1_1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 __sl__home__sl__mingze__sl__mr-tandem__sl__tandem_params.xml

 -reducer1_1.1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 __sl__home__sl__mingze__sl__mr-tandem__sl__tandem_params.xml

 -mapper2_1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 __sl__home__sl__mingze__sl__mr-tandem__sl__tandem_params.xml 

 -reducer2_1.1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150729222348 __sl__home__sl__mingze__sl__mr-tandem__sl__tandem_params.xml

sangzhe commented 9 years ago

具体的作用我也不太能理解，同样还有一个-reportURL这个参数，在final那一步的hadoop streaming中出现。可以参考src_hadoop目录下的mapreducehelper.cpp，mapreducehandler.cpp。在mapreducehandler.cpp的218行有三步mapreduce的说明，我想和这个也有关系

baimingze commented 9 years ago

麻烦你在general _config.json文件里面添加如下两行： "debug":"True" "verbose":"True"

然后把运行记录传上来我比较一下呢。

baimingze commented 9 years ago

@sangzhe 还需要麻烦你确认一下python以及boto库的版本：

[mingze@localhost mr-tandem]$ python
Python 2.7.10 (default, Jul  5 2015, 14:15:43) 
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto
>>> print boto.__version__
2.38.0
>>>

不知道是不是因为boto版本的原因导致的错误？

sangzhe commented 9 years ago

的确是不一样。

[sangzhe@localhost mr-tandem]$ python
Python 2.7.9 (default, Apr 15 2015, 12:08:00) 
[GCC 5.0.0 20150319 (Red Hat 5.0.0-0.21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto
>>> print boto.__version__
2.0rc1

install_prerequisite.sh中给出的boto版本是2.0rc1

sangzhe commented 9 years ago

我把我的boto版本换成了2.38.0，依然可以运行下来。 python版本应该没什么影响。

baimingze commented 9 years ago

我现在有点怀疑我用的boost库的问题，如果你可以帮我查看一下运行文件夹里有没有这个“part0000”文件：

[mingze@localhost mr-tandem]$ /home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop fs -ls -r /home/mingze/hadoop/tandem_params.xml_runs/20150804204716/output1.1
ls: Cannot access -r: No such file or directory.
Found 2 items
drwxr-xr-x   - mingze supergroup          0 2015-08-04 21:47 /home/mingze/hadoop/tandem_params.xml_runs/20150804204716/output1.1/_logs
-rw-r--r--   1 mingze supergroup         35 2015-08-04 21:47 /home/mingze/hadoop/tandem_params.xml_runs/20150804204716/output1.1/part-00000

baimingze commented 9 years ago

我把我电脑/lib64里面的libboost_serialization.so.1.57.0删掉，结果程序还是可以运行。。。难道我电脑中的xtandem没有用这个库？

sangzhe commented 9 years ago

output1.1目录下有part0000，内容为

00001   FILE=000000000000:reducer1_1

我把libboost_serialization.so.1.57.0移除后，直接执行tandem就会提示缺libboost_serialization.so.1.57.0。执行mr-tandem,第一步mapreduce就异常了

15/08/02 23:12:53 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/08/02 23:13:00 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/02 23:13:09 INFO streaming.StreamJob:  map 100%  reduce 33% 
15/08/02 23:13:12 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/02 23:13:22 INFO streaming.StreamJob:  map 100%  reduce 33% 
15/08/02 23:13:25 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/02 23:13:34 INFO streaming.StreamJob:  map 100%  reduce 33% 
15/08/02 23:13:37 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/02 23:13:46 INFO streaming.StreamJob:  map 100%  reduce 33% 
15/08/02 23:13:49 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/02 23:13:52 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/08/02 23:13:52 INFO streaming.StreamJob: To kill this job, run: 
15/08/02 23:13:52 INFO streaming.StreamJob: /home/sangzhe/hadoop/hadoop-0.20.1+152/bin/hadoop job  -Dmapred.job.tracker=hdfs://localhost:8021 -kill job_201508022211_0017 
15/08/02 23:13:52 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201508022211_0017 
15/08/02 23:13:52 ERROR streaming.StreamJob: Job not Successful! 
15/08/02 23:13:52 INFO streaming.StreamJob: killJob...

老师现在一直是在第二步mapreduce出错对吗？

baimingze commented 9 years ago

Yes, always stop at the second step command.

SangZhe notifications@github.com于2015年8月5日周三上午1:17写道：

有output1.1目录下有part0000，内容为

00001 FILE=000000000000:reducer1_1

我把libboost_serialization.so.1.57.0移除后，直接执行tandem就会提示缺libboost_serialization.so.1.57.0。执行mr-tandem,第一步mapreduce就异常了

15/08/02 23:12:53 INFO streaming.StreamJob: map 0% reduce 0% 15/08/02 23:13:00 INFO streaming.StreamJob: map 100% reduce 0% 15/08/02 23:13:09 INFO streaming.StreamJob: map 100% reduce 33% 15/08/02 23:13:12 INFO streaming.StreamJob: map 100% reduce 0% 15/08/02 23:13:22 INFO streaming.StreamJob: map 100% reduce 33% 15/08/02 23:13:25 INFO streaming.StreamJob: map 100% reduce 0% 15/08/02 23:13:34 INFO streaming.StreamJob: map 100% reduce 33% 15/08/02 23:13:37 INFO streaming.StreamJob: map 100% reduce 0% 15/08/02 23:13:46 INFO streaming.StreamJob: map 100% reduce 33% 15/08/02 23:13:49 INFO streaming.StreamJob: map 100% reduce 0% 15/08/02 23:13:52 INFO streaming.StreamJob: map 100% reduce 100% 15/08/02 23:13:52 INFO streaming.StreamJob: To kill this job, run: 15/08/02 23:13:52 INFO streaming.StreamJob: /home/sangzhe/hadoop/hadoop-0.20.1+152/bin/hadoop job -Dmapred.job.tracker=hdfs://localhost:8021 -kill job_201508022211_0017 15/08/02 23:13:52 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201508022211_0017 15/08/02 23:13:52 ERROR streaming.StreamJob: Job not Successful! 15/08/02 23:13:52 INFO streaming.StreamJob: killJob...

老师现在一直是在第二步mapreduce出错对吗？

— Reply to this email directly or view it on GitHub https://github.com/baimingze/project1/issues/4#issuecomment-127801882.

sangzhe commented 9 years ago

可能真的是tandem程序的问题，我下载了一个无hadoop化的tandem程序，然后把general_config中tandem_url指向这个程序再运行mr-tandem，也是可以执行完第一步的，然后在第二步出错。

baimingze commented 9 years ago

我运行不成功的原因基本找到了，应该就是popen没有成功上传。麻烦你也把你的记录贴一下，看是不是 http///中间还应该有一个localhost。

这是我在mapreducehelper.cpp里面添加的代码（line740）：

#ifndef MSVC  // hadoop and linux stuff here
                if (!local) { // copy out to HDFS
                        std::string cmd("hadoop dfs -put ");
                        cmd += oname;
                        cmd += " ";
                        cmd += odir;
                        cerr << timestamp() << "tring to do this" << cmd <<"\n";

           if (FILE *file = fopen(oname.c_str(), "r")) {
                fclose(file);
                            cerr << timestamp() << oname <<"exist" <<"\n";
           } else {
                            cerr << timestamp() << oname <<"not exist" <<"\n";
           }   

                        popen(cmd.c_str(),"r");
                }
#endif

这是输出结果：

[mingze@localhost userlogs]$ cat attempt_201508052122_0009_r_000000_0/stderr
15/08/05 21:14:50 reducer1_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/08/05 21:14:50 reducer1_1: read lines from mapper_1 to estabish mapper count
15/08/05 21:14:50 reducer1_1: 1 mappers reporting in with 1 unique hostnames
15/08/05 21:14:50 reducer1_1: Loading spectra .15/08/05 21:14:50 reducer1_1: loaded.
15/08/05 21:14:50 reducer1_1: Spectra matching criteria = 653
15/08/05 21:14:50 reducer1_1: Pluggable scoring enabled.
15/08/05 21:14:50 reducer1_1: Starting threads .15/08/05 21:14:50 reducer1_1: send 1 processes to mapper_2
15/08/05 21:14:50 reducer1_1: output a process with 653 spectra for task 00001
15/08/05 21:14:50 reducer1_1: tring to do thishadoop dfs -put /home/mingze/hadoop/hadoop-0.20.1+152/hadoop-tmp/mapred/local/taskTracker/jobcache/job_201508052122_0009/work/reducer1_1 hdfs:///home/mingze/hadoop/tandem_params.xml_runs/20150805211425
15/08/05 21:14:50 reducer1_1: /home/mingze/hadoop/hadoop-0.20.1+152/hadoop-tmp/mapred/local/taskTracker/jobcache/job_201508052122_0009/work/reducer1_1exist
15/08/05 21:14:50 reducer1_1: done

sangzhe commented 9 years ago

编译不成功，目前提示： serialize.h:145:47: 致命错误：boost/serialization/base_object.hpp：没有那个文件或目录编译中断。

我看到bin目录下的tandem是新编译的，但我直接用了以后输出的结果并没有不同。

sangzhe commented 9 years ago

@baimingze 请问老师用的boost库是哪个版本的？我在编译的时候还缺少文件，

baimingze commented 9 years ago

你解压缩后有没有看到src_hadoop/boost/目录？这个文件在这个目录下。

baimingze commented 9 years ago

问题终于解决：是hadoop的路径问题，我c++程序获得PATH有点问题，手动把hadoop绝对路径放到c++源代码里就可以了：

                if (!local) { // copy out to HDFS
                        std::string cmd("/home/mingze/hadoop/hadoop-0.20.1+152/bin/hadoop dfs -put ");
                        cmd += oname;
                        cmd += " ";
                        cmd += odir;
                        cerr << timestamp() << "tring to do this" << cmd <<"\n";

搞定了。

[mingze@localhost mr-tandem]$ python  mr-tandem.py general_config.json tandem_params.xml

15/08/06 22:01:29 uploading /home/mingze/work/xtandem_modified/bin/tandem to /home/mingze/hadoop//home/mingze/work/xtandem_modified/bin/tandem_20150806220059 
15/08/06 22:01:35 scripts, config and logs will be written to tandem_params.xml_runs/20150806220125 
15/08/06 22:01:35 begin execution 
15/08/06 22:01:35 running Hadoop step tandem_params.xml-condition 
15/08/06 23:01:36 INFO streaming.StreamJob: Running job: job_201508052122_0021 
15/08/06 23:01:37 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/08/06 23:01:45 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/06 23:01:56 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/08/06 23:01:59 INFO streaming.StreamJob: Job complete: job_201508052122_0021 
15/08/06 22:01:59 running Hadoop step tandem_params.xml-process-final 
15/08/06 23:02:00 INFO streaming.StreamJob: Running job: job_201508052122_0022 
15/08/06 23:02:01 INFO streaming.StreamJob:  map 0%  reduce 0% 
15/08/06 23:02:11 INFO streaming.StreamJob:  map 100%  reduce 0% 
15/08/06 23:02:23 INFO streaming.StreamJob:  map 100%  reduce 100% 
15/08/06 23:02:26 INFO streaming.StreamJob: Job complete: job_201508052122_0022 
15/08/06 22:02:31 Done.  X!Tandem logs: 
15/08/06 22:01:51 reducer1_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/08/06 22:01:51 reducer1_1: read lines from mapper_1 to estabish mapper count
15/08/06 22:01:51 reducer1_1: 1 mappers reporting in with 1 unique hostnames
15/08/06 22:01:51 reducer1_1: Loading spectra .15/08/06 22:01:51 reducer1_1: loaded.
15/08/06 22:01:51 reducer1_1: Spectra matching criteria = 653
15/08/06 22:01:51 reducer1_1: Pluggable scoring enabled.
15/08/06 22:01:51 reducer1_1: Starting threads .15/08/06 22:01:51 reducer1_1: send 1 processes to mapper_2
15/08/06 22:01:51 reducer1_1: output a process with 653 spectra for task 00001
15/08/06 22:02:06 mapper2_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/08/06 22:02:06 mapper2_1: loading processes...15/08/06 22:02:06 mapper2_1.00001: loaded process with spectra count = 653
done.
15/08/06 22:02:06 mapper2_1.00001: Computing models on 653 spectra:
        1215/08/06 22:02:07 mapper2_1.00001: output a process with 653 spectra for task 00001
15/08/06 22:02:18 reducer2_1: X! TANDEM 2010.10.01.1 (LabKey, Insilicos and ISB)

15/08/06 22:02:18 reducer2_1: loading processes...15/08/06 22:02:19 reducer2_1: loaded process with spectra count = 653
done.
15/08/06 22:02:19 reducer2_1: Creating report:
        initial calculations  ..... done.
        sorting  ..... done.
        finding repeats ..... done.
        evaluating results ..... done.
        calculating expectations ..... done.
        writing results writing output to hadoop task directory as /home/mingze/hadoop/hadoop-0.20.1+152/hadoop-tmp/mapred/local/taskTracker/jobcache/job_201508052122_0022/work/output
..... done.
15/08/06 22:02:19 reducer2_1: 
Valid models = 9 
15/08/06 22:02:31 (this log is also written to tandem_params.xml_runs/20150806220125/results.txt) 
15/08/06 22:02:31 Results written to output 
15/08/06 22:02:31 elapsed time = 0:01:05.157084

baimingze / project1

HDFS目录 #4