huge combiner or reducer failures

It seems that when a combine and reduce are too big jobs don't fail, a few tasks are killed, error logs are not very instructive and NAs are intermixed with the results where they shouldn't be. It could be a time out or an out of memory, but a timeout normally causes the job to fail. I think it could be more an R out of memory that doesn't exit with any message or error code. This is happening in 1.2.2 so with no rmr C code to speak of, which is used from 1.3. At a minimum, I woud like to see task attempts fail or succeed with correct results, not this in-between. Second is there any way to control the number of combiners and reducers to avoid this issue -- there is for reducers but it is left to the user for now. Another approach would be to sidestep the use of lists completely in the case of structured data and keep everything as data frames which are much more compact (a data frame path from input to output). This would bring memory usage down, but doesn't solve the problem of R not failing when it should.

RevolutionAnalytics / RHadoop

huge combiner or reducer failures #100