RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

memory trouble in rmr2 #216

Closed Logosxxw closed 9 years ago

Logosxxw commented 9 years ago
userItemSimi <- mapreduce(
  input = nearestUsersItems,
  output = "/user/rlan/userItemSimi",
  map = function(k, v) {
    names(v) <- c("user1", "user2", "simi", "item", "vtm")
    keyval(v[,c("user1", "item")], v$simi)
  },
  reduce = function(k, v) {
    keyval(k, sum(v))
  }
)

R version 3.1.0 Hadoop 2.0.0-cdh4.7.0

The input is a keyval which key is null and val is a dataframe with 700,000 rows and 5 variables. When runing in hadoop, it will be stuck at mapper (three servers, 2 map tasks parallel per server) and each server was almost ran out of 16G memory. It seems like there are something wrong with memory management. Can anyone give some help? thanks very much!

piccolbo commented 9 years ago

You need to report this in the rmr2 issue tracker, thanks.