RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

Rhadoop package- rmr failed to run a simple function #186

Closed fion8828 closed 11 years ago

fion8828 commented 11 years ago

Hi Everyone,

I'm new to rhadoop and R. For background of this error, I implement hadoop cluster via cloudera manager. Now I've successfully implemented R, rmr2 and rhdfs. However, it failed to run this command: "ints = to.dfs(1:100)", giving me this error: "sh: 1: /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/: not found" I'm pretty sure I've set $HADOOP_HOME and $HADOOP_CMD.

Thanks in advance for thoughts, suggestions!

Fiona

piccolbo commented 11 years ago

Hi Fiona, could you enter this R expression

Sys.getenv("HADOOP_CMD")

and report back on the output? second could you verify whether that path is correct? At the unix prompt

ls -l /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/

and please share the results.

Third rmr2 needs either HADOOP_HOME or the pair HADOOP_CMD and HADOOP_STREAMING. The first way is the old approach and it needs the hadoop layout to follow certain expectations. It turns out that modern distributions don't all follow them, so we switched to the system with two variables. You don't mention HADOOP_STREAMING, so I thought I would remind you of that. HADOOP_HOME is still needed by other RHadoop packages, but not rmr2 if you have the other two. I doesn't seem to be the problem here, but it will be later because to.dfs uses streaming when writing binary formats.

Fourth, just a terminology check, what you call a "successful implementation" here we call a "failed installation". Not to be pedantic, but it felt a little like translating from linear A. Using standard terminology will help us understand each other.

Antonio

fion8828 commented 11 years ago

Hi Antonio,

I did the three steps you suggested. The output for first step is : > Sys.getenv("HADOOP_CMD") [1] "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/"

The output for second step is: dlabadmin@ub12hdpmaster:~$ ls -l /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop -rwxr-xr-x 1 root root 790 Apr 22 17:19 /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop

The third step output is:

Sys.getenv("HADOOP_CMD") [1] "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/" Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar") library(rmr2) library(rhdfs) ints = to.dfs(1:100) sh: 1: /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/: not found Warning message: In to.dfs(1:100) : Converting to.dfs argument to keyval with a NULL key

Still got the same error.

Is there any further steps I should take?

Thanks,

Fiona

piccolbo commented 11 years ago

On Thu, May 2, 2013 at 11:44 AM, fion8828 notifications@github.com wrote:

Hi Antonio,

I did the three steps you suggested. The output for first step is : > Sys.getenv("HADOOP_CMD") [1] "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/"

So this has an extra slash at the end, you need to fix that.

The output for second step is: dlabadmin@ub12hdpmaster:~$ ls -l /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop -rwxr-xr-x 1 root root 790 Apr 22 17:19 /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop

This is the correct value for the above.

The third step output is:

Sys.getenv("HADOOP_CMD") [1] "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/"

Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar") library(rmr2) library(rhdfs) ints = to.dfs(1:100)

sh: 1: /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/bin/hadoop/: not found Warning message: In to.dfs(1:100) : Converting to.dfs argument to keyval with a NULL key

Still got the same error.

Yes even one character off can do it.

Is there any further steps I should take?

Fix the value for HADOOP_CMD. It is not a directory and it does not terminate with a slash.

Antonio

Thanks,

Fiona

— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/RHadoop/issues/186#issuecomment-17356490 .

fion8828 commented 11 years ago

lol,Thank you soooo much Antonio!! This "/" almost kill me! But now the error disappears!