Open byu777 opened 8 years ago
I figured this out now. I installed rmr2 from within RStudio and somehow the library was not available to the script even though the mapreduce function seems to run successfully. I was surprised that in one of the logs, I read that rmr2 was not found, but the script still gave me a _SUCCESS!
I eventually installed rmr2 fresh in R (using sudo R), with the required packages, reshape2 and caTools, and everything seems to work fine now.
Hey @byu777 , I am facing the same problem. Can you please help me out with this? I tried installing rmr2 using R CMD but the output is still the same.
Hi byu777/VJ-Vikvy, , any luck to solve this issue. I a, also facing the same issue.
Hey @Surender1984 , Please install the RMR2 package using:
sudo R CMD INSTALL rmr2.tar.gz
But before doing that install all the required packages, as root user this will solve your issue. Please let me know if you face any issues.
Also, I think RMR2 and RHDFS is dead and we need to switch to Spark. What are your opinion on this?
@byu777 Problem still not solved for me :(
Code :
#########################################################################
#########################################################################
Sys.setenv(HADOOP_HOME="/home/sharwin/Programs/hadoop-2.7.5")
Sys.setenv(HADOOP_CMD="/home/sharwin/Programs/hadoop-2.7.5/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/home/sharwin/Programs/hadoop-2.7.5/share/hadoop/tools/lib/hadoop-streaming-2.7.5.jar")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-9-oracle")
library(rJava)
library(rhdfs)
library(rmr2)
library(reshape2)
library(caTools)
hdfs.init()
# Clear previous output
hdfs.rmr('/test/out')
#============================================================
map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}
## read text files from folder example/wordcount/data
hdfs.root <- '/test'
hdfs.data <- file.path(hdfs.root, 'data')
## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')
## Submit job
out <- wordcount(hdfs.data, hdfs.out)
## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')
head(results.df)
@sharwinbobde Please install the RMR2 package using:
sudo R CMD INSTALL rmr2.tar.gz
But before doing that install all the required packages, as root user this will solve your issue. Please let me know if you face any issues.
I tried the following simple script on rmr2 in Cloudera Quickstart 5.7.0 but mapreduce does not generate any results. Here is the script:
Here is the output:
to.dfs and from.dfs do work since I tried the following: