RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

Error in mr(map = map, reduce = reduce, combine = combine, in.folder = if (is.list(input)) { : #149

Closed pietheinstrengholt closed 11 years ago

pietheinstrengholt commented 11 years ago

I installed Rhadoop (rmr2, rhdfs) on my single node Ubuntu Hadoop cluster. I've also installed Rstudio server. I'm using a hadoop user. The R packages are all installed in the system directory (/usr/lib/R/site-library). When I use a simple example the Hadoop Streaming Command Failed. See more details below. I also posted the environment variables below. Any idea what might be wrong?

mapreduce(input=small.ints, map=function(k,v) keyval(v, v^2))

packageJobJar: [/tmp/RtmpgIWiYg/rmr-local-env, /tmp/RtmpgIWiYg/rmr-global-env, /tmp/RtmpgIWiYg/rhstr.map2238387a5905, /app/hadoop/tmp/hadoop-unjar2952034430760324543/] [] /tmp/streamjob2978332954638050808.jar tmpDir=null 12/10/25 14:57:59 INFO mapred.FileInputFormat: Total input paths to process : 1 12/10/25 14:58:00 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] 12/10/25 14:58:00 INFO streaming.StreamJob: Running job: job_201210251113_0016 12/10/25 14:58:00 INFO streaming.StreamJob: To kill this job, run: 12/10/25 14:58:00 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=node1:54311 -kill job_201210251113_0016 12/10/25 14:58:00 INFO streaming.StreamJob: Tracking URL: http://node1:50030/jobdetails.jsp?jobid=job_201210251113_0016 12/10/25 14:58:01 INFO streaming.StreamJob: map 0% reduce 0% 12/10/25 14:59:09 INFO streaming.StreamJob: map 100% reduce 100% 12/10/25 14:59:09 INFO streaming.StreamJob: To kill this job, run: 12/10/25 14:59:09 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=node1:54311 -kill job_201210251113_0016 12/10/25 14:59:09 INFO streaming.StreamJob: Tracking URL: http://node1:50030/jobdetails.jsp?jobid=job_201210251113_0016 12/10/25 14:59:09 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201210251113_0016_m_000001 12/10/25 14:59:09 INFO streaming.StreamJob: killJob... Streaming Command Failed! Error in mr(map = map, reduce = reduce, combine = combine, in.folder = if (is.list(input)) { : hadoop streaming failed with error code 1

Sys.getenv() DISPLAY ":0" EDITOR "vi" GIT_ASKPASS "rpostback-askpass" HADOOP_CMD "/usr/local/hadoop/bin/hadoop" HADOOP_HOME "/usr/local/hadoop" HADOOP_STREAMING "/usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.3.jar" HOME "/home/hduser" LC_CTYPE "en_US.UTF-8" LD_LIBRARY_PATH "/usr/lib/R/lib:/lib:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/i386/server:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/i386:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib:@JAVA_LD@" LN_S "ln -s" LOGNAME "hduser" MAKE "make" NLSPATH "/usr/dt/lib/nls/msg/%L/%N.cat" PAGER "/usr/bin/pager" PATH "/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin" RSTUDIO_USER_IDENTITY "hduser" RS_RPOSTBACK_PATH "/usr/lib/rstudio-server/bin/rpostback" R_BROWSER "xdg-open" R_BZIPCMD "/bin/bzip2" R_DOC_DIR "/usr/share/R/doc" R_GZIPCMD "/bin/gzip" R_HOME "/usr/lib/R" R_INCLUDE_DIR "/usr/share/R/include" R_LIBS_SITE "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" R_LIBS_USER "~/R/library" R_PAPERSIZE "letter" R_PAPERSIZE_USER "letter" R_PDFVIEWER "/usr/bin/xdg-open" R_PLATFORM "i686-pc-linux-gnu" R_PRINTCMD "/usr/bin/lpr" R_RD4DVI "ae" R_RD4PDF "times,inconsolata,hyper" R_SESSION_TMPDIR "/tmp/RtmpgIWiYg" R_SHARE_DIR "/usr/share/R/share" R_SYSTEM_ABI "linux,gcc,gxx,gfortran,?" R_TEXI2DVICMD "/usr/bin/texi2dvi" R_UNZIPCMD "/usr/bin/unzip" R_ZIPCMD "/usr/bin/zip" SED "/bin/sed" SSH_ASKPASS "rpostback-askpass" TAR "/bin/tar" USER "hduser" XFILESEARCHPATH "/usr/dt/app-defaults/%L/Dt" values(from.dfs(mapreduce(to.dfs(1), map = function(k,v) Sys.getenv()))) 12/10/25 14:54:36 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/10/25 14:54:36 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/10/25 14:54:36 INFO compress.CodecPool: Got brand-new compressor packageJobJar: [/tmp/RtmpgIWiYg/rmr-local-env, /tmp/RtmpgIWiYg/rmr-global-env, /tmp/RtmpgIWiYg/rhstr.map22386822354f, /app/hadoop/tmp/hadoop-unjar6021404261473777239/] [] /tmp/streamjob1986913280492305242.jar tmpDir=null 12/10/25 14:54:40 INFO mapred.FileInputFormat: Total input paths to process : 1 12/10/25 14:54:40 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] 12/10/25 14:54:40 INFO streaming.StreamJob: Running job: job_201210251113_0015 12/10/25 14:54:40 INFO streaming.StreamJob: To kill this job, run: 12/10/25 14:54:40 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=node1:54311 -kill job_201210251113_0015 12/10/25 14:54:40 INFO streaming.StreamJob: Tracking URL: http://node1:50030/jobdetails.jsp?jobid=job_201210251113_0015 12/10/25 14:54:41 INFO streaming.StreamJob: map 0% reduce 0% 12/10/25 14:55:48 INFO streaming.StreamJob: map 100% reduce 100% 12/10/25 14:55:48 INFO streaming.StreamJob: To kill this job, run: 12/10/25 14:55:48 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=node1:54311 -kill job_201210251113_0015 12/10/25 14:55:48 INFO streaming.StreamJob: Tracking URL: http://node1:50030/jobdetails.jsp?jobid=job_201210251113_0015 12/10/25 14:55:48 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201210251113_0015_m_000000 12/10/25 14:55:48 INFO streaming.StreamJob: killJob... Streaming Command Failed! Error in mr(map = map, reduce = reduce, combine = combine, in.folder = if (is.list(input)) { : hadoop streaming failed with error code 1 small.ints = to.dfs(1:1000) 12/10/25 14:57:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/10/25 14:57:22 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/10/25 14:57:22 INFO compress.CodecPool: Got brand-new compressor mapreduce(input=small.ints, map=function(k,v) keyval(v, v^2)) packageJobJar: [/tmp/RtmpgIWiYg/rmr-local-env, /tmp/RtmpgIWiYg/rmr-global-env, /tmp/RtmpgIWiYg/rhstr.map2238387a5905, /app/hadoop/tmp/hadoop-unjar2952034430760324543/] [] /tmp/streamjob2978332954638050808.jar tmpDir=null 12/10/25 14:57:59 INFO mapred.FileInputFormat: Total input paths to process : 1 12/10/25 14:58:00 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] 12/10/25 14:58:00 INFO streaming.StreamJob: Running job: job_201210251113_0016 12/10/25 14:58:00 INFO streaming.StreamJob: To kill this job, run: 12/10/25 14:58:00 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=node1:54311 -kill job_201210251113_0016 12/10/25 14:58:00 INFO streaming.StreamJob: Tracking URL: http://node1:50030/jobdetails.jsp?jobid=job_201210251113_0016 12/10/25 14:58:01 INFO streaming.StreamJob: map 0% reduce 0% 12/10/25 14:59:09 INFO streaming.StreamJob: map 100% reduce 100% 12/10/25 14:59:09 INFO streaming.StreamJob: To kill this job, run: 12/10/25 14:59:09 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=node1:54311 -kill job_201210251113_0016 12/10/25 14:59:09 INFO streaming.StreamJob: Tracking URL: http://node1:50030/jobdetails.jsp?jobid=job_201210251113_0016 12/10/25 14:59:09 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201210251113_0016_m_000001 12/10/25 14:59:09 INFO streaming.StreamJob: killJob... Streaming Command Failed! Error in mr(map = map, reduce = reduce, combine = combine, in.folder = if (is.list(input)) { : hadoop streaming failed with error code 1

Here's a output of all my environments:

Sys.getenv() DISPLAY ":0" EDITOR "vi" GIT_ASKPASS "rpostback-askpass" HADOOP_CMD "/usr/local/hadoop/bin/hadoop" HADOOP_HOME "/usr/local/hadoop" HADOOP_STREAMING "/usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.3.jar" HOME "/home/hduser" LC_CTYPE "en_US.UTF-8" LD_LIBRARY_PATH "/usr/lib/R/lib:/lib:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/i386/server:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/i386:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib:@JAVA_LD@" LN_S "ln -s" LOGNAME "hduser" MAKE "make" NLSPATH "/usr/dt/lib/nls/msg/%L/%N.cat" PAGER "/usr/bin/pager" PATH "/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin" RSTUDIO_USER_IDENTITY "hduser" RS_RPOSTBACK_PATH "/usr/lib/rstudio-server/bin/rpostback" R_BROWSER "xdg-open" R_BZIPCMD "/bin/bzip2" R_DOC_DIR "/usr/share/R/doc" R_GZIPCMD "/bin/gzip" R_HOME "/usr/lib/R" R_INCLUDE_DIR "/usr/share/R/include" R_LIBS_SITE "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" R_LIBS_USER "~/R/library" R_PAPERSIZE "letter" R_PAPERSIZE_USER "letter" R_PDFVIEWER "/usr/bin/xdg-open" R_PLATFORM "i686-pc-linux-gnu" R_PRINTCMD "/usr/bin/lpr" R_RD4DVI "ae" R_RD4PDF "times,inconsolata,hyper" R_SESSION_TMPDIR "/tmp/RtmpgIWiYg" R_SHARE_DIR "/usr/share/R/share" R_SYSTEM_ABI "linux,gcc,gxx,gfortran,?" R_TEXI2DVICMD "/usr/bin/texi2dvi" R_UNZIPCMD "/usr/bin/unzip" R_ZIPCMD "/usr/bin/zip" SED "/bin/sed" SSH_ASKPASS "rpostback-askpass" TAR "/bin/tar" USER "hduser" XFILESEARCHPATH "/usr/dt/app-defaults/%L/Dt"

pietheinstrengholt commented 11 years ago

And here are the job details:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249)

syslog logs doop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/jars/job.jar <- /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/attempt_201210251113_0014_m_000000_0/work/job.jar 2012-10-25 14:50:20,663 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/jars/rmr-global-env <- /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/attempt_201210251113_0014_m_000000_0/work/rmr-global-env 2012-10-25 14:50:20,692 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/jars/rmr-local-env <- /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/attempt_201210251113_0014_m_000000_0/work/rmr-local-env 2012-10-25 14:50:20,720 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/jars/META-INF <- /app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201210251113_0014/attempt_201210251113_0014_m_000000_0/work/META-INF 2012-10-25 14:50:21,508 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists! 2012-10-25 14:50:22,149 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2012-10-25 14:50:22,202 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1684706 2012-10-25 14:50:22,861 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2012-10-25 14:50:22,864 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2012-10-25 14:50:22,888 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2012-10-25 14:50:22,927 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2012-10-25 14:50:23,395 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2012-10-25 14:50:23,395 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2012-10-25 14:50:23,495 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/Rscript, rhstr.map22382b1db0ee] 2012-10-25 14:50:23,629 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 2012-10-25 14:50:24,359 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2012-10-25 14:50:24,365 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 2012-10-25 14:50:24,492 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2012-10-25 14:50:24,759 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. 2012-10-25 14:50:24,760 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hduser for UID 1001 from the native implementation 2012-10-25 14:50:24,787 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) 2012-10-25 14:50:24,803 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

stderr logs Error in library(rmr) : there is no package called ‘rmr’ Execution halted java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249)

pietheinstrengholt commented 11 years ago

Already found it. I also had to install rmr (version 1): R CMD INSTALL rmr_1.3.1.tar.gz /usr/lib/R/site-library

Issue can be closed!

piccolbo commented 11 years ago

In 25 year of working with software I have never encountered one case where installing two versions on top of each other was required or a solution to anything. In the case of rmr we tried to make it an option to help people transitioning their code but certainly it is not required. I am at a loss as to what is trying to load the old package in your case but if I were you I would immediately uninstall rmr 1.3.1 and try to figure out the root cause and I am here to help you. Also, this is a forum for people to report issues from silly to important, but we need to keep in mind that we are also forming a knowledge base for a community to rely upon. Let's try not to make general statements when we can not defend them in the long run.

On Thu, Oct 25, 2012 at 7:07 AM, Piethein Strengholt < notifications@github.com> wrote:

Already found it. I also had to install rmr (version 1): R CMD INSTALL rmr_1.3.1.tar.gz /usr/lib/R/site-library

Issue can be closed!

— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/RHadoop/issues/149#issuecomment-9778752.

KRISHC2010 commented 11 years ago

I AM GETTING ERROR IN MR(MAP= MAP , REDUCE = REDUCE, COMBINE = COMBINE, VECTORIZED.REDUCE hadoop streaming failed with error code 1 error in to.dfs.path(input) : object 'calc' not found This is the error i am getting i am not able to resolve i have set environment variable hadoop streaming and i have loaded all the packages please help me to resolve my issue please respond to my request

jackyim commented 8 years ago

Hi, I'm getting this error too. Any solution for these? URGENT

Streaming Command Failed! Show Traceback

Rerun with Debug Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1

krish588 commented 7 years ago

I have encountered the same did u get any answer