RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

Not able to read HDFS in Map of RMR2 #229

Open sureshappana opened 8 years ago

sureshappana commented 8 years ago

Hi, I am trying to access HDFS file in Map function of RMR. (The file is of type cdf.) I am using the following approach but not able to succeed in it.

Normal approach in R(without using mapreduce): d <- open.ncdf("file.cdf")

This refers to local file.

Appoach I am trying in RMR:

x=hdfs.file("file.cdf") d<- open.ncdf(x) #We will write this function call in map function

Error: No file found with the specified name. (I even tried by giving absolute path)

I am replacing local file reference with HDFS reference. (I can't use hdfs.read.text.file because my file is not in text format)

So could any one help me if there is anyway to refer the HDFS file (other than text file)?

(P.S: I can't use form.dfs also in my map because file is of size ~70MB)

Environment: R Version - 3.2.2 Rmr-2_3.3.1 Cloudera Quickstart VM 5.5.0

Please let me know if any information required.

Thanks

RavikiranCK commented 7 years ago

Hi Suresh ,

Do you able to get any solution for the problem u have mentioned ?? Same problem I'm facing...

Thank you, Ravikiran C K

juagarmar commented 7 years ago

Hi Suresh ,

check this example, maybe could help you.

Set up the enviroment

Sys.setenv(HADOOP_CMD='/usr/bin/hadoop') Sys.setenv(HADOOP_HOME='/usr/lib/hadoop-0.20-mapreduce') Sys.setenv(HADOOP_STREAMING='/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.7.1.jar') library(rJava) library(rmr2) library(rhdfs) hdfs.init()

Define the arguments 'x' & 'y'

table<-read.csv('http://archive.ics.uci.edu/ml/machine-learning-databases/00265/CASP.csv', sep=",") table<-as.numeric(unlist(table)) table<-matrix(table, ncol=10) X1<-to.dfs(table)

good luck

Regards

Juan