cloudera / poisson_sampling

10 stars 13 forks source link

reducer task stuck at 67% #3

Closed andrewmilkowski closed 11 years ago

andrewmilkowski commented 11 years ago

Hi

seeing following in the tasktracker (while running fitRandomForrest.R)

tail -f hadoop-hadoop-tasktracker-localhost.localdomain.log

2013-09-29 11:11:38,571 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201309291106_0001m-1871339866 exited with exit code 0. Number of tasks it ran: 1 2013-09-29 11:11:41,190 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.22222224% reduce > copy (2 of 3 at 49.50 MB/s) > 2013-09-29 11:11:44,708 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, dest: 127.0.0.1:42419, bytes: 342109108, op: MAPRED_SHUFFLE, cliID: attempt_201309291106_0001_m_000002_0, duration: 1901216344 2013-09-29 11:11:45,373 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.22222224% reduce > copy (2 of 3 at 49.50 MB/s) > 2013-09-29 11:11:45,422 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.22222224% reduce > copy (2 of 3 at 49.50 MB/s) > 2013-09-29 11:11:47,258 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.66767067% reduce > reduce 2013-09-29 11:11:50,302 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.66767067% reduce > reduce 2013-09-29 11:11:59,379 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.66767067% reduce > reduce 2013-09-29 11:12:02,447 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.66767067% reduce > reduce 2013-09-29 11:12:56,567 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.66767067% reduce > reduce 2013-09-29 11:14:44,687 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309291106_0001_r_000000_0 0.66767067% reduce > reduce

reducer eventually moves to 68% but something internally has gone wrong

my environment:

[amilkowski@localhost hadoop]$ uname -a Linux localhost.localdomain 2.6.32-358.18.1.el6.x86_64 #1 SMP Tue Aug 27 14:23:09 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux

and using cloudera distro: 0.20.2 cdh3u6

please advice, if more log samples/env data is needed please ask will provide

also trying to run debugging.R what would be the procedure to generate training.small.csv for input to troubleshoot this further?

thanks much!

laserson commented 11 years ago

Have you tried letting this computation finish? IIRC, the reducer progress bar is really 3 separate stages: shuffle, copy, and reduce. The actual computation happens last, so the tree fitting is really only starting at 67%.

andrewmilkowski commented 11 years ago

I believe computation will finish (I just value my Mac motherbood) I have increased number of reducers to 10 but do note as per comment in the blog...

"Want to also note that seeing one reduce task being created, tried setting D=”mapred.reduce.tasks=10″ in the mapred function

however this caused only 2 reduce tasks to be created (2 R processes) this is still way too small of the number, it won’t scale…

24497 mapred 20 0 667m 481m 4284 R 97.8 12.6 4:58.21 R

24492 mapred 20 0 667m 481m 4284 R 97.1 12.6 4:43.82 R

this is hence looking less and less as rmr2 but a combination of theoretical underpinnings of random forest / input data entropy/structure"

so the computation will finish for sure, they did finish with cutting down on the input data.. question is much larger

laserson commented 11 years ago

Could you open a separate issue for this? Also, could you include your code? (Specifically, how you're specifying the number of reducers.)

andrewmilkowski commented 11 years ago

sure will (new ticket) Uri! thanks for looking into this..