RevolutionAnalytics / rmr2

A package that allows R developer to use Hadoop MapReduce
160 stars 149 forks source link

Java heap space Error only when there is a reducer in mapreduce #148

Closed piccolbo closed 9 years ago

piccolbo commented 9 years ago

Originally reported here

derrickoswald commented 9 years ago

As suggested by piccolbo, I doubled the backend parameters:

#' Simple test of RHadoop.
example = function ()
{
  evenodd = function (v)
  {
    ret = if (0 == bitwAnd (as.integer (v), 1)) "even" else "odd"
    return (ret)
  }

  mapper = function (k, v)
  {
    keyval (unlist (lapply (v, evenodd)), v)
  }

  reducer = function (key, values)
  {
    keyval (key, length (values)) # sum (values))
  }

  rmr.options (
    backend.parameters = list (
      hadoop = list (
        D = "mapreduce.map.java.opts=-Xmx800M",
        D = "mapreduce.reduce.java.opts=-Xmx800M",
        D = "mapreduce.map.memory.mb=8192",
        D = "mapreduce.reduce.memory.mb=8192"
      )
    )
  )

  ints = to.dfs (1:100)
  #calc = mapreduce (input = ints, map = mapper) 
  calc = mapreduce (input = ints, map = mapper, reduce = reducer) 
  print (from.dfs (calc))
}

But the error is still the same:

> hdfs.init (); test::example ()
14/11/14 09:00:59 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/11/14 09:00:59 INFO compress.CodecPool: Got brand-new compressor [.deflate]
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.4.0.2.1.5.0-695.jar] /tmp/streamjob8649825258584273018.jar tmpDir=null
14/11/14 09:01:02 INFO impl.TimelineClientImpl: Timeline service address: http://hrn.bkw-hdp.ch:8188/ws/v1/timeline/
14/11/14 09:01:02 INFO client.RMProxy: Connecting to ResourceManager at hrn.bkw-hdp.ch/10.10.0.12:8050
14/11/14 09:01:03 INFO impl.TimelineClientImpl: Timeline service address: http://hrn.bkw-hdp.ch:8188/ws/v1/timeline/
14/11/14 09:01:03 INFO client.RMProxy: Connecting to ResourceManager at hrn.bkw-hdp.ch/10.10.0.12:8050
14/11/14 09:01:03 INFO mapred.FileInputFormat: Total input paths to process : 1
14/11/14 09:01:04 INFO mapreduce.JobSubmitter: number of splits:2
14/11/14 09:01:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415823260172_0004
14/11/14 09:01:04 INFO impl.YarnClientImpl: Submitted application application_1415823260172_0004
14/11/14 09:01:04 INFO mapreduce.Job: The url to track the job: http://hrn.bkw-hdp.ch:8088/proxy/application_1415823260172_0004/
14/11/14 09:01:04 INFO mapreduce.Job: Running job: job_1415823260172_0004
14/11/14 09:01:12 INFO mapreduce.Job: Job job_1415823260172_0004 running in uber mode : false
14/11/14 09:01:12 INFO mapreduce.Job:  map 0% reduce 0%
14/11/14 09:01:17 INFO mapreduce.Job: Task Id : attempt_1415823260172_0004_m_000000_0, Status : FAILED
Error: Java heap space
14/11/14 09:01:22 INFO mapreduce.Job: Task Id : attempt_1415823260172_0004_m_000000_1, Status : FAILED
Error: Java heap space
14/11/14 09:01:27 INFO mapreduce.Job: Task Id : attempt_1415823260172_0004_m_000000_2, Status : FAILED
Error: Java heap space
14/11/14 09:01:34 INFO mapreduce.Job:  map 100% reduce 100%
14/11/14 09:01:34 INFO mapreduce.Job: Job job_1415823260172_0004 failed with state FAILED due to: Task failed task_1415823260172_0004_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

14/11/14 09:01:35 INFO mapreduce.Job: Counters: 12
    Job Counters 
        Failed map tasks=4
        Launched map tasks=4
        Other local map tasks=3
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=14241
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=14241
        Total vcore-seconds taken by all map tasks=14241
        Total megabyte-seconds taken by all map tasks=116662272
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
14/11/14 09:01:35 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1

For completeness, here is the log for the first of the four failed map tasks for the example program:

Log Type: syslog

Log Length: 3526
2014-11-14 09:01:15,374 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-11-14 09:01:15,409 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2014-11-14 09:01:15,489 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-11-14 09:01:15,489 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2014-11-14 09:01:15,502 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-11-14 09:01:15,503 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1415823260172_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7ad66ecc)
2014-11-14 09:01:15,595 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2014-11-14 09:01:16,025 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /grid/00/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/01/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/02/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/03/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/04/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/05/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/06/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/07/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/08/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004,/grid/09/hadoop/yarn/local/usercache/vcn_osd/appcache/application_1415823260172_0004
2014-11-14 09:01:16,587 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2014-11-14 09:01:17,061 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2014-11-14 09:01:17,346 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://hnn.bkw-hdp.ch:8020/tmp/file53c624829a08:273+274
2014-11-14 09:01:17,397 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2014-11-14 09:01:17,398 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.deflate]
2014-11-14 09:01:17,409 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2014-11-14 09:01:17,416 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2014-11-14 09:01:17,619 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:963)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:419)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
derrickoswald commented 9 years ago

Reverting to rmr2 version 3.1.0 (rmr2_3.1.0.tar.gz) found here; https://github.com/RevolutionAnalytics/rmr2/tree/master/build also produces the same error. Also version 3.0.0 (rmr2_3.0.0.tar.gz).

derrickoswald commented 9 years ago

Installing the rmr2 package yields a number of suspicious messages:

> install.packages("rmr2_3.2.0.tar.gz", repos=NULL, source=TRUE)
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:47:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:00 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
* installing *source* package ‘rmr2’ ...
** libs
g++ -m64 -I/usr/include/R -DNDEBUG  -I/usr/local/include   `/usr/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic  -c extras.cpp -o extras.o

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:04 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
extras.cpp: In function ‘SEXPREC* vsum(SEXPREC*)’:
extras.cpp:22: warning: comparison between signed and unsigned integer expressions
g++ -m64 -I/usr/include/R -DNDEBUG  -I/usr/local/include   `/usr/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic  -c hbase-to-df.cpp -o hbase-to-df.o

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:11 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
hbase-to-df.cpp: In function ‘SEXPREC* raw_list_to_character(SEXPREC*)’:
hbase-to-df.cpp:27: warning: comparison between signed and unsigned integer expressions
hbase-to-df.cpp: In function ‘SEXPREC* hbase_to_df(SEXPREC*, SEXPREC*)’:
hbase-to-df.cpp:56: warning: comparison between signed and unsigned integer expressions
hbase-to-df.cpp:60: warning: comparison between signed and unsigned integer expressions
hbase-to-df.cpp:64: warning: comparison between signed and unsigned integer expressions
g++ -m64 -I/usr/include/R -DNDEBUG  -I/usr/local/include   `/usr/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic  -c keyval.cpp -o keyval.o

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:19 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
keyval.cpp: In function ‘SEXPREC* sapply_rmr_length(SEXPREC*)’:
keyval.cpp:57: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* sapply_rmr_length_lossy_data_frame(SEXPREC*)’:
keyval.cpp:64: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* sapply_length_keyval(SEXPREC*)’:
keyval.cpp:79: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* sapply_null_keys(SEXPREC*)’:
keyval.cpp:86: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* sapply_is_list(SEXPREC*)’:
keyval.cpp:94: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* lapply_key_val(SEXPREC*, std::string)’:
keyval.cpp:101: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* are_factor(SEXPREC*)’:
keyval.cpp:115: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* are_data_frame(SEXPREC*)’:
keyval.cpp:129: warning: comparison between signed and unsigned integer expressions
keyval.cpp: In function ‘SEXPREC* are_matrix(SEXPREC*)’:
keyval.cpp:136: warning: comparison between signed and unsigned integer expressions
g++ -m64 -I/usr/include/R -DNDEBUG  -I/usr/local/include   `/usr/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic  -c t-list.cpp -o t-list.o

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:26 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
t-list.cpp: In function ‘SEXPREC* t_list(SEXPREC*)’:
t-list.cpp:27: warning: comparison between signed and unsigned integer expressions
t-list.cpp:29: warning: comparison between signed and unsigned integer expressions
t-list.cpp:31: warning: comparison between signed and unsigned integer expressions
g++ -m64 -I/usr/include/R -DNDEBUG  -I/usr/local/include   `/usr/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic  -c typed-bytes.cpp -o typed-bytes.o

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:33 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
typed-bytes.cpp: In function ‘T unserialize_numeric(const raw&, unsigned int&) [with T = double]’:
typed-bytes.cpp:136: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = std::basic_string<char, std::char_traits<char>, std::allocator<char> >]’:
typed-bytes.cpp:188: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘Rcpp::List unserialize_list(const raw&, unsigned int&)’:
typed-bytes.cpp:201: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘Rcpp::List unserialize_map(const raw&, unsigned int&)’:
typed-bytes.cpp:217: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘Rcpp::RObject unserialize(const raw&, unsigned int&, int)’:
typed-bytes.cpp:286: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘void serialize_noattr(const Rcpp::RObject&, raw&, bool)’:
typed-bytes.cpp:478: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp:482: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp:509: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp:515: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = char]’:
typed-bytes.cpp:191:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = unsigned char]’:
typed-bytes.cpp:240:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = bool]’:
typed-bytes.cpp:300:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = int]’:
typed-bytes.cpp:303:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = long int]’:
typed-bytes.cpp:321:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = float]’:
typed-bytes.cpp:324:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions
typed-bytes.cpp: In function ‘std::vector<T, std::allocator<_Tp1> > unserialize_vector(const raw&, unsigned int&, int) [with T = double]’:
typed-bytes.cpp:327:   instantiated from here
typed-bytes.cpp:180: warning: comparison between signed and unsigned integer expressions

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:44 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
g++ -m64 -shared -L/usr/local/lib64 -o rmr2.so extras.o hbase-to-df.o keyval.o t-list.o typed-bytes.o -L/usr/lib64/R/lib -lR

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:48 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
((which hbase && (mkdir -p ../inst; cd hbase-io; sh build_linux.sh; cp build/dist/* ../../inst)) || echo "can't build hbase IO classes, skipping" >&2)
/usr/bin/hbase
build_linux.sh: line 163: [: missing `]'
Using /usr/lib/hadoop-mapreduce as hadoop home
Using /usr/lib/hbase as hbase home

Copying libs into local build directory
Cannot find hbase jars in hbase home
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/lib64/R/library/rmr2/libs
** R
** preparing package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
** help
*** installing help indices
  converting help for package ‘rmr2’
    finding HTML links ... done
    bigdataobject                           html
    dfs.empty                               html
    equijoin                                html
    fromdfstodfs                            html
    keyval                                  html
    make.io.format                          html
    mapreduce                               html
    rmr-package                             html
    rmr.options                             html
    rmr.sample                              html
    rmr.str                                 html
    scatter                                 html
    status                                  html
    tomaptoreduce                           html
    vsum                                    html
** building package indices
** testing if installed package can be loaded

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()
14/11/14 11:48:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/14 11:48:53 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
* DONE (rmr2)
Making 'packages.html' ... done
> 
derrickoswald commented 9 years ago

Solved (or at least a work-around). Based on https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/V6Td-XRQC_8 I adjusted the mapreduce.task.io.sort.mb value down from 1024 to 64 and this allowed the small example program to work correclty.

piccolbo commented 9 years ago

Thanks for researching this and reporting back. Since you reproed the problem against three different versions of rmr2, I am less inclined to think it's a problem with the way rmr2 sets some hadoop properties. It seems to me that your mapreduce.task.io.sort.mb was set 10X the default and that may not be compatible with other settings. Hadoop MR has lots of settings that require understanding of the internals and may interact and that's not a good thing. There are even research projects about automatic configuration.

ShashiGudur commented 8 years ago

I tested ( on Horton works HDP 2.3.) I can only run rmr2 3.1.0 version the next two versions 3.2.0 and 3.3.0 and 3.3.1, I couldnt run it because i am getting Java heap space error only with reducer, my mapreduce.task.io.sort.mb = 64

system settings Sys.setenv(HADOOP_HOME="/usr/hdp/current/hadoop-client") #Its hadooop path Sys.setenv(HADOOP_CMD="/usr/hdp/2.3.2.0-2950/hadoop/bin/hadoop") #It's CMD path Sys.setenv(HADOOP_STREAMING="/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-streaming.jar") # It's streaming path Sys.setenv(HADOOP_HEAPSIZE=2900)

following are my rmr settings

rmr.options.env = new.env(parent=emptyenv())

rmr.options.env$backend = "hadoop" rmr.options.env$profile.nodes = "off" rmr.options.env$hdfs.tempdir = "/tmp" #can't check it exists here rmr.options.env$exclude.objects = NULL

rmr.options.env$backend.parameters =
list( hadoop = list(cmdenv="PATH=/usr/local/lib64/R/bin/", D = "mapreduce.map.java.opts=-Xmx1024M", D = "mapreduce.reduce.java.opts=-Xmx2048M",

D = "mapred.job.queue.name=other",

       D = "mapred.tasktracker.map.tasks.maximum",
       D = "mapred.tasktracker.reduce.tasks.maximum",
       D = "mapreduce.map.memory.mb = 5120",
       D = "mapreduce.reduce.memory.mb = 5120",
       D = "mapreduce.task.io.sort.mb =64",
       D = "yarn.scheduler.minimum-allocation.mb = 1000",
       D = "yarn.scheduler.maximum-allocation.mb = 2000"))

@derrickoswald :: were you able to run rmr2 3.2.0 or later versions without java heap space error? I tried all the version after 3.1.0 but my job fails at the reducer

Thanks Shashi

saargolde commented 8 years ago

rmr2 V. 3.3.1 works on my HDP sandbox 2.3 without any problem, both with the configuration you defined above and with my (much shorter) setup, which includes:

Sys.setenv("HADOOP_CMD"="/usr/bin/hadoop") Sys.setenv("HADOOP_STREAMING"="/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/hadoop-streaming-2.7.1.2.3.0.0-2557.jar")

rmr.options(backend.parameters = list( hadoop = list(D = "mapreduce.map.memory.mb=1024") ))

It might be helpful to post your code and/or the error messages, because the answers to the following questions matter:

There may be more questions to ask, but that's a good start.

On Wed, Nov 25, 2015 at 2:54 PM, ShashiGudur notifications@github.com wrote:

@derrickoswald https://github.com/derrickoswald :: were able to run rmr2 3.2.0 or later versions without java heap space error? I tried all the version after 3.1.0 but my job fails at the reducer

Thanks Shashi

— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/rmr2/issues/148#issuecomment-159715448 .

ShashiGudur commented 8 years ago

Here is the code and error

if I only use map function it works perfect and fails when I use reducer or map reduce together


rbingroups = rbinom(30, n = 50, prob = 0.5) groups<-tapply(rbingroups, rbingroups, length) groups = to.dfs(rbingroups) WARNING: Use "yarn jar" to launch YARN applications. 15/11/25 13:32:40 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/11/25 13:32:40 INFO compress.CodecPool: Got brand-new compressor [.deflate] rmr.options.env = new.env(parent=emptyenv())

rmr.options.env$backend = "hadoop" rmr.options.env$profile.nodes = "off" rmr.options.env$hdfs.tempdir = "/tmp" #can't check it exists here rmr.options.env$exclude.objects = NULL

rmr.options.env$backend.parameters =

  • list(
  • hadoop =
  • list(cmdenv="PATH=/usr/local/lib64/R/bin/",
  • D = "mapreduce.map.java.opts=-Xmx1024M",
  • D = "mapreduce.reduce.java.opts=-Xmx2048M",
  • D = "mapred.job.queue.name=other",

  • D = "mapred.tasktracker.map.tasks.maximum",
  • D = "mapred.tasktracker.reduce.tasks.maximum",
  • D = "mapreduce.map.memory.mb = 5120",
  • D = "mapreduce.reduce.memory.mb = 5120",
  • D = "mapreduce.task.io.sort.mb =64",
  • D = "yarn.scheduler.minimum-allocation.mb = 1000",
  • D = "yarn.scheduler.maximum-allocation.mb = 2000")) rbingroups = rbinom(30, n = 50, prob = 0.5) groups<-tapply(rbingroups, rbingroups, length) groups = to.dfs(rbingroups) WARNING: Use "yarn jar" to launch YARN applications. 15/11/25 13:36:59 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/11/25 13:36:59 INFO compress.CodecPool: Got brand-new compressor [.deflate] out = mapreduce(input = groups,map = function(k,v) keyval(v, 1),reduce = function(k,vv) keyval(k, length(vv))) WARNING: Use "yarn jar" to launch YARN applications. packageJobJar: [] [/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-streaming-2.7.1.2.3.2.0-2950.jar] /tmp/streamjob6715154162749713919.jar tmpDir=null 15/11/25 13:37:18 INFO impl.TimelineClientImpl: Timeline service address: http://znlhacdq0003.amer.zurich.corp:8188/ws/v1/timeline/ 15/11/25 13:37:18 INFO impl.TimelineClientImpl: Timeline service address: http://znlhacdq0003.amer.zurich.corp:8188/ws/v1/timeline/ 15/11/25 13:37:18 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 74288 for uszllf7 on ha-hdfs:ZHDPDEV 15/11/25 13:37:18 INFO security.TokenCache: Got dt for hdfs://ZHDPDEV; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ZHDPDEV, Ident: (HDFS_DELEGATION_TOKEN token 74288 for uszllf7) 15/11/25 13:37:20 INFO mapred.FileInputFormat: Total input paths to process : 1 15/11/25 13:37:20 INFO mapreduce.JobSubmitter: number of splits:2 15/11/25 13:37:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447621077216_4452 15/11/25 13:37:20 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ZHDPDEV, Ident: (HDFS_DELEGATION_TOKEN token 74288 for uszllf7) 15/11/25 13:37:21 INFO impl.YarnClientImpl: Submitted application application_1447621077216_4452 15/11/25 13:37:21 INFO mapreduce.Job: The url to track the job: http://znlhacdq0001.amer.zurich.corp:8088/proxy/application_1447621077216_4452/ 15/11/25 13:37:21 INFO mapreduce.Job: Running job: job_1447621077216_4452 15/11/25 13:37:33 INFO mapreduce.Job: Job job_1447621077216_4452 running in uber mode : false 15/11/25 13:37:33 INFO mapreduce.Job: map 0% reduce 0% 15/11/25 13:37:43 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000000_0, Status : FAILED Error: Java heap space 15/11/25 13:37:43 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000001_0, Status : FAILED Error: Java heap space 15/11/25 13:37:52 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000000_1, Status : FAILED Error: Java heap space 15/11/25 13:37:53 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000001_1, Status : FAILED Error: Java heap space 15/11/25 13:37:57 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000000_2, Status : FAILED Error: Java heap space 15/11/25 13:37:58 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000001_2, Status : FAILED Error: Java heap space 15/11/25 13:38:03 INFO mapreduce.Job: map 100% reduce 100% 15/11/25 13:38:03 INFO mapreduce.Job: Job job_1447621077216_4452 failed with state FAILED due to: Task failed task_1447621077216_4452_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0

15/11/25 13:38:03 INFO mapreduce.Job: Counters: 13 Job Counters Failed map tasks=7 Killed map tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=44129 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=44129 Total vcore-seconds taken by all map tasks=44129 Total megabyte-seconds taken by all map tasks=451880960 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 15/11/25 13:38:03 ERROR streaming.StreamJob: Job not successful! Streaming Command Failed! Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1 15/11/25 13:38:12 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 3600 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://ZHDPDEV/tmp/file113f63ecb6cc' to trash at: hdfs://ZHDPDEV/user/uszllf7/.Trash/Current 15/11/25 13:38:19 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 3600 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://ZHDPDEV/tmp/file113f7614dedf' to trash at: hdfs://ZHDPDEV/user/uszllf7/.Trash/Current

saargolde commented 8 years ago

Try with my setup (detailed up in this thread), and if that doesn't work, try re-installing. It's a mystery...

On Wed, Nov 25, 2015 at 7:26 PM, ShashiGudur notifications@github.com wrote:

Here is the code and error

if I only use map function it works perfect and fails when I use reducer

or map reduce together

rbingroups = rbinom(30, n = 50, prob = 0.5) groups<-tapply(rbingroups, rbingroups, length) groups = to.dfs(rbingroups) WARNING: Use "yarn jar" to launch YARN applications. 15/11/25 13:32:40 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/11/25 13:32:40 INFO compress.CodecPool: Got brand-new compressor [.deflate] rmr.options.env = new.env(parent=emptyenv())

rmr.options.env$backend = "hadoop" rmr.options.env$profile.nodes = "off" rmr.options.env$hdfs.tempdir = "/tmp" #can't check it exists here rmr.options.env$exclude.objects = NULL

rmr.options.env$backend.parameters =

  • list(
  • hadoop =
  • list(cmdenv="PATH=/usr/local/lib64/R/bin/",
  • D = "mapreduce.map.java.opts=-Xmx1024M",
  • D = "mapreduce.reduce.java.opts=-Xmx2048M",
  • D = "mapred.job.queue.name=other",

  • D = "mapred.tasktracker.map.tasks.maximum",
  • D = "mapred.tasktracker.reduce.tasks.maximum",
  • D = "mapreduce.map.memory.mb = 5120",
  • D = "mapreduce.reduce.memory.mb = 5120",
  • D = "mapreduce.task.io.sort.mb =64",
  • D = "yarn.scheduler.minimum-allocation.mb = 1000",
  • D = "yarn.scheduler.maximum-allocation.mb = 2000")) rbingroups = rbinom(30, n = 50, prob = 0.5) groups<-tapply(rbingroups, rbingroups, length) groups = to.dfs(rbingroups) WARNING: Use "yarn jar" to launch YARN applications. 15/11/25 13:36:59 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/11/25 13:36:59 INFO compress.CodecPool: Got brand-new compressor [.deflate] out = mapreduce(input = groups,map = function(k,v) keyval(v, 1),reduce = function(k,vv) keyval(k, length(vv))) WARNING: Use "yarn jar" to launch YARN applications. packageJobJar: [] [/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-streaming-2.7.1.2.3.2.0-2950.jar] /tmp/streamjob6715154162749713919.jar tmpDir=null 15/11/25 13:37:18 INFO impl.TimelineClientImpl: Timeline service address: http://znlhacdq0003.amer.zurich.corp:8188/ws/v1/timeline/ 15/11/25 13:37:18 INFO impl.TimelineClientImpl: Timeline service address: http://znlhacdq0003.amer.zurich.corp:8188/ws/v1/timeline/ 15/11/25 13:37:18 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 74288 for uszllf7 on ha-hdfs:ZHDPDEV 15/11/25 13:37:18 INFO security.TokenCache: Got dt for hdfs://ZHDPDEV; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ZHDPDEV, Ident: (HDFS_DELEGATION_TOKEN token 74288 for uszllf7) 15/11/25 13:37:20 INFO mapred.FileInputFormat: Total input paths to process : 1 15/11/25 13:37:20 INFO mapreduce.JobSubmitter: number of splits:2 15/11/25 13:37:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447621077216_4452 15/11/25 13:37:20 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ZHDPDEV, Ident: (HDFS_DELEGATION_TOKEN token 74288 for uszllf7) 15/11/25 13:37:21 INFO impl.YarnClientImpl: Submitted application application_1447621077216_4452 15/11/25 13:37:21 INFO mapreduce.Job: The url to track the job: http://znlhacdq0001.amer.zurich.corp:8088/proxy/application_1447621077216_4452/ 15/11/25 13:37:21 INFO mapreduce.Job: Running job: job_1447621077216_4452 15/11/25 13:37:33 INFO mapreduce.Job: Job job_1447621077216_4452 running in uber mode : false 15/11/25 13:37:33 INFO mapreduce.Job: map 0% reduce 0% 15/11/25 13:37:43 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000000_0, Status : FAILED Error: Java heap space 15/11/25 13:37:43 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000001_0, Status : FAILED Error: Java heap space 15/11/25 13:37:52 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000000_1, Status : FAILED Error: Java heap space 15/11/25 13:37:53 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000001_1, Status : FAILED Error: Java heap space 15/11/25 13:37:57 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000000_2, Status : FAILED Error: Java heap space 15/11/25 13:37:58 INFO mapreduce.Job: Task Id : attempt_1447621077216_4452_m_000001_2, Status : FAILED Error: Java heap space 15/11/25 13:38:03 INFO mapreduce.Job: map 100% reduce 100% 15/11/25 13:38:03 INFO mapreduce.Job: Job job_1447621077216_4452 failed with state FAILED due to: Task failed task_1447621077216_4452_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0

15/11/25 13:38:03 INFO mapreduce.Job: Counters: 13 Job Counters Failed map tasks=7 Killed map tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=44129 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=44129 Total vcore-seconds taken by all map tasks=44129 Total megabyte-seconds taken by all map tasks=451880960 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 15/11/25 13:38:03 ERROR streaming.StreamJob: Job not successful! Streaming Command Failed! Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1 15/11/25 13:38:12 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 3600 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://ZHDPDEV/tmp/file113f63ecb6cc' to trash at: hdfs://ZHDPDEV/user/uszllf7/.Trash/Current 15/11/25 13:38:19 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 3600 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://ZHDPDEV/tmp/file113f7614dedf' to trash at: hdfs://ZHDPDEV/user/uszllf7/.Trash/Current

— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/rmr2/issues/148#issuecomment-159765034 .