The input is a keyval which key is null and val is a dataframe with 700,000 rows and 5 variables.
When runing in hadoop, it will be stuck at mapper (three servers, 2 map tasks parallel per server) and each server was almost ran out of 16G memory.
It seems like there are something wrong with memory management.
Can anyone give some help? thanks very much!
R version 3.1.0 Hadoop 2.0.0-cdh4.7.0
The input is a keyval which key is null and val is a dataframe with 700,000 rows and 5 variables. When runing in hadoop, it will be stuck at mapper (three servers, 2 map tasks parallel per server) and each server was almost ran out of 16G memory. It seems like there are something wrong with memory management. Can anyone give some help? thanks very much!