memory trouble in rmr2 - Githubissues

userItemSimi <- mapreduce(
  input = nearestUsersItems,
  output = "/user/rlan/userItemSimi",
  map = function(k, v) {
    names(v) <- c("user1", "user2", "simi", "item", "vtm")
    keyval(v[,c("user1", "item")], v$simi)
  },
  reduce = function(k, v) {
    keyval(k, sum(v))
  }
)

R version 3.1.0 Hadoop 2.0.0-cdh4.7.0

The input is a keyval which key is null and val is a dataframe with 700,000 rows and 5 variables. When runing in hadoop, it will be stuck at mapper (three servers, 2 map tasks parallel per server) and each server was almost ran out of 16G memory. It seems like there are something wrong with memory management. Can anyone give some help? thanks very much!

RevolutionAnalytics / RHadoop

memory trouble in rmr2 #216