RevolutionAnalytics / rhdfs

A package that allows R developers to use Hadoop HDFS
64 stars 73 forks source link

reading output of Hadoop job from RHDFS problem! #2

Closed canil closed 11 years ago

canil commented 11 years ago

Hi, I have run the kmeans.R and the job finished successfully. But the outpu file cannot be read or reached. It always say to me it is not permitted. I tried even sudo . The output is in /tmp/Rtmpm1Mc34/filef281df27152. How can I see the output ? from R or from terminal ?

and secondly how Can I set the ouput file in the hdfs . I mean, when I do hdfs.ls(".") it should be listed.

13/02/24 21:57:18 INFO streaming.StreamJob: Job complete: job_201302242120_0006 13/02/24 21:57:18 INFO streaming.StreamJob: Output: /tmp/Rtmpm1Mc3/filef281df27152

Thanks in advance (I hope this time the place is OK. )

piccolbo commented 11 years ago

See help(big.data.object). Also I would be wary of using relative paths for HDFS paths as there isn't really a concept of current directory in any of the tools I know. As far as where to submit this issue, it seems me it is more rmr2 related hence it should go on its issue tracker, but I applied the benefit of the doubt. On Feb 24, 2013 1:35 PM, "canil" notifications@github.com wrote:

Hi, I have run the kmeans.R and the job finished successfully. But the outpu file cannot be read or reached. It always say to me it is not permitted. I tried even sudo . The output is in /tmp/Rtmpm1Mc34/filef281df27152. How can I see the output ? from R or from terminal ?

and secondly how Can I set the ouput file in the hdfs . I mean, when I do hdfs.ls(".") it should be listed.

13/02/24 21:57:18 INFO streaming.StreamJob: Job complete: job_201302242120_0006 13/02/24 21:57:18 INFO streaming.StreamJob: Output: /tmp/Rtmpm1Mc3/filef281df27152

Thanks in advance (I hope this time the place is OK. )

— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/rhdfs/issues/2.

canil commented 11 years ago

Ok I see but the problem is in the /tmp/Rtmpm1Mc34/filef281df27152 file , there is no file name with filef281df27152. How could it be possible?

canil commented 11 years ago

canil@ubuntu:/tmp/Rtmpm1Mc34$ ls

filef2812eeefd9
filef28196444ff
filef281a05cbac
filef282b2eebb5
filef2836109aa7
filef2844add552
filef28496f38dc
filef284ee1e46a
filef285dd7a9a7
filef285f98d740
filef2866c462db
filef2871bc06be
filef287d5dbe04

rmr-local-envf28695c38b2 rmr-local-envf2875f0e17d rmr-streaming-combinef281c63ecdf rmr-streaming-combinef28297d5d7 rmr-streaming-combinef28525259f3 rmr-streaming-combinef285ee8bfff rmr-streaming-combinef28682ac069 rmr-streaming-combinef287ae4d0b8 rmr-streaming-mapf2811e266b0 rmr-streaming-mapf281e802579
rmr-streaming-mapf28275a312d rmr-streaming-mapf2839cc7dd0

rmr-local-envf282e53c271 rmr-global-envf281111658 rmr-streaming-mapf284c50b3eb rmr-global-envf281a5bc8f0 rmr-streaming-mapf285a7c9a07 rmr-global-envf281bda0c7e rmr-streaming-reducef282516b34f rmr-global-envf283ef7d220 rmr-streaming-reducef2837f69102 rmr-global-envf2855ebf052 rmr-streaming-reducef284b6a0360 rmr-global-envf289a2ce91 rmr-streaming-reducef2874cf8b58 rmr-local-envf28114a13c7 rmr-streaming-reducef288dfd156 rmr-local-envf2813344c9e rmr-streaming-reducef28ea7d1cc rmr-local-envf281b1206f1

piccolbo commented 11 years ago

Where are you looking for it? Do you understand that multiple file systems exist that do not share a root? On Feb 24, 2013 2:24 PM, "canil" notifications@github.com wrote:

canil@ubuntu:/tmp/Rtmpm1Mc34$ ls filef2812eeefd9 rmr-local-envf282e53c271 filef28196444ff rmr-local-envf28695c38b2 filef281a05cbac rmr-local-envf2875f0e17d filef282b2eebb5 rmr-streaming-combinef281c63ecdf filef2836109aa7 rmr-streaming-combinef28297d5d7 filef2844add552 rmr-streaming-combinef28525259f3 filef28496f38dc rmr-streaming-combinef285ee8bfff filef284ee1e46a rmr-streaming-combinef28682ac069 filef285dd7a9a7 rmr-streaming-combinef287ae4d0b8 filef285f98d740 rmr-streaming-mapf2811e266b0 filef2866c462db rmr-streaming-mapf281e802579 filef2871bc06be rmr-streaming-mapf28275a312d filef287d5dbe04 rmr-streaming-mapf2839cc7dd0 rmr-global-envf281111658 rmr-streaming-mapf284c50b3eb rmr-global-envf281a5bc8f0 rmr-streaming-mapf285a7c9a07 rmr-global-envf281bda0c7e rmr-streaming-reducef282516b34f rmr-global-envf283ef7d220 rmr-streaming-reducef2837f69102 rmr-global-envf2855ebf052 rmr-streaming-reducef284b6a0360 rmr-global-envf289a2ce91 rmr-streaming-reducef2874cf8b58 rmr-local-envf28114a13c7 rmr-streaming-reducef288dfd156 rmr-local-envf2813344c9e rmr-streaming-reducef28ea7d1cc rmr-local-envf281b1206f1

— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/rhdfs/issues/2#issuecomment-14017606.

canil commented 11 years ago

I am looking for into file space that hadoop output refers. 13/02/24 21:57:00 INFO streaming.StreamJob: map 100% reduce 0% 13/02/24 21:57:12 INFO streaming.StreamJob: map 100% reduce 100% 13/02/24 21:57:18 INFO streaming.StreamJob: Job complete: job_201302242120_0006

13/02/24 21:57:18 INFO streaming.StreamJob: Output: /tmp/Rtmpm1Mc34/filef281df27152

Output: /tmp/Rtmpm1Mc34/filef281df27152

So, when I enter to this file there is no file with name "filef281df27152 ". I did the ls command above. So whatI am doing wrong ?

piccolbo commented 11 years ago

Sorry I was taking the wrong approach here. The correct answer is: those filenames are not part of the API. They are not documented. As a user, you can't use them. They are an implementation detail. They could be anywhere and everywhere and moved as necessary. End of story. Please read the tutorial and the help. Please use the library the way it is intended to be used or it won't work or at the very least you can't get support when you go outside the documented API. Moreover, the issue tracker is not a substitute for training. You have to read the documentation first. If you want personalized training Revolution Analytics offer paid training services. Thanks

Antonio

On Sun, Feb 24, 2013 at 2:31 PM, canil notifications@github.com wrote:

I am looking for into file space that hadoop output refers. 13/02/24 21:57:00 INFO streaming.StreamJob: map 100% reduce 0% 13/02/24 21:57:12 INFO streaming.StreamJob: map 100% reduce 100%

13/02/24 21:57:18 INFO streaming.StreamJob: Job complete: job_201302242120_0006

13/02/24 21:57:18 INFO streaming.StreamJob: Output: /tmp/Rtmpm1Mc34/filef281df27152

Output: /tmp/Rtmpm1Mc34/filef281df27152

So, when I enter to this file there is no file with name "filef281df27152 ". I did the ls command above. So whatI am doing wrong ?

— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/rhdfs/issues/2#issuecomment-14017762.

canil commented 11 years ago

Ok thanks.

fonsoim commented 11 years ago

I have a similar problem.

data=to.dfs(1:10) res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v)) print(res())

[1] "/tmp/Rtmpr5Xv1g/file34916a6426bf"

And then....

from.dfs(res)

Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs ... ...

Where is the problem?

Thanks in advance

Alfonso