Closed andrewmilkowski closed 10 years ago
please close this ticket... fundamental user error
specifying in
mapred-site.xml
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>The maximum number of map tasks that will be run simultaneously by a task tracker.</description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>10</value>
<description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.</description>
</property>
creates (up to) 10 reducers and in the sample of my run I specified 3 reducers (hence 3 R processes), although for a test run on a singular machine this is just not appropriate. (not with the size of the input data in this project)
Swap: 3260408k total, 628824k used, 2631584k free, 226596k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12662 mapred 20 0 667m 480m 4284 R 93.8 12.6 2:59.71 R
12702 mapred 20 0 662m 475m 4284 R 49.2 12.4 2:09.04 R
12694 mapred 20 0 819m 632m 4392 R 48.9 16.5 2:04.52 R
Glad you found the error. I'll close...
no problem @laserson , now after all these tests I need to get new Mac motherboard (joking)
so the only thing that remains is the small input data in rmr2 (https://github.com/RevolutionAnalytics/rmr2/issues/69)
if you have an idea also please do let know... this is towards general stability of the solution (graceful run)
in addition I am looking to swap standard R random forest implementation with random jungler (this might shave some cycles, although Revolution R is already precompiled with Intel MKL tight...
complete source code for fitRandomForest.R (line changed relevant to the ticket is addition of D="mapred.reduce.tasks=10")
also attached screen shots
Also, I have noticed that despite the fact the 3 mappers were created, 3rd mapper began working on the input data ONLY after 2 mappers completed their process ![screen shot 2013-10-01 at 9 11 43 am]
-- MAPPER --
mapper (3 per default specified) only 2 are processing data initially, later 3rd mapper starts processing data (but not all 3 in parallel)
--- REDUCER ---
reducers (10 specified, only 2 are processing data)
also result from top shows 2 R processes (one per reducer) mapper task breakdown is similar while 3 mappers are active only 2 R Processes are running (3rd one is sitting idle)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31373 mapred 20 0 667m 480m 4284 R 98.0 12.6 7:09.47 R
31370 mapred 20 0 670m 483m 4284 R 93.4 12.6 7:07.21 R