RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

output is in sequence format #214

Closed RajkumarB closed 10 years ago

RajkumarB commented 10 years ago

After installing Rhadoop using rmr2, rhdfs as suggested, I ran small example as follows.

small.ints = to.dfs (1:10) Warning: $HADOOP_HOME is deprecated. 14/09/10 08:16:17 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/09/10 08:16:17 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 14/09/10 08:16:17 INFO compress.CodecPool: Got brand-new compressor

mapreduce (input = small.ints, map = function (k, v) cbind (v, v ^ 2)) Warning: $HADOOP_HOME is deprecated. packageJobJar: [/home/mlcoeadmin/rajkumar/hadoop-1.2.1/temp/hadoop-unjar6616713832785091129/] [] /tmp/streamjob8123934586004586291.jar tmpDir=null 14/09/10 08:16:35 INFO mapred.FileInputFormat: Total input paths to process : 1 14/09/10 08:16:36 INFO streaming.StreamJob: getLocalDirs(): [/home/mlcoeadmin/rajkumar/hadoop-1.2.1/temp/mapred/local] 14/09/10 08:16:36 INFO streaming.StreamJob: Running job: job_201409100815_0001 14/09/10 08:16:36 INFO streaming.StreamJob: To kill this job, run: 14/09/10 08:16:36 INFO streaming.StreamJob: /home/mlcoeadmin/rajkumar/hadoop-1.2.1/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201409100815_0001 14/09/10 08:16:36 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201409100815_0001 14/09/10 08:16:37 INFO streaming.StreamJob: map 0% reduce 0% 14/09/10 08:16:50 INFO streaming.StreamJob: map 100% reduce 0% 14/09/10 08:16:55 INFO streaming.StreamJob: map 100% reduce 100% 14/09/10 08:16:55 INFO streaming.StreamJob: Job complete: job_201409100815_0001 14/09/10 08:16:55 INFO streaming.StreamJob: Output: /tmp/file91f7466a911 function () { fname }

It ran nicely. When I checked in the output folder, it contains files as...

hdfs.ls("/tmp/file91f7466a911") permission owner group size modtime 1 -rw-r--r-- mlcoeadmin supergroup 0 2014-09-10 08:16 2 drwxr-xr-x mlcoeadmin supergroup 0 2014-09-10 08:16 3 -rw-r--r-- mlcoeadmin supergroup 122 2014-09-10 08:16 4 -rw-r--r-- mlcoeadmin supergroup 797 2014-09-10 08:16 file 1 /tmp/file91f7466a911/_SUCCESS 2 /tmp/file91f7466a911/_logs 3 /tmp/file91f7466a911/part-00000 4 /tmp/file91f7466a911/part-00001 But the output files are in sequence format. It looks as,

hdfs.cat("/tmp/file91f7466a911/part-00000") [1] "SEQ\006/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\025\024�~\021X5#>٠\023���" hdfs.cat("/tmp/file91f7466a911/part-00001") [1] "SEQ\006/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80>1�d\u0092�:�Cx����\024\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80U�\xc0\x80\xc0\x80\xc0\x80\001\006�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\037\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\005names\a\xc0\x80\xc0\x80\xc0\x80\frmr.template\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\024\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003key\a\xc0\x80\xc0\x80\xc0\x80\003val�\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\001\xc0\x80\xc0\x80\001\a\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\006?�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\b\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\020\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\024\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\030\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\034\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@ \xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\"\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@$\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80?�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\020\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\"\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@0\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@9\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@B\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@H�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@P\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@T@\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@Y\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80" [2] "\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80U�\xc0\x80\xc0\x80\xc0\x80\001\006�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\037\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\005names\a\xc0\x80\xc0\x80\xc0\x80\frmr.template\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\024\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003key\a\xc0\x80\xc0\x80\xc0\x80\003val�\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\001"

Which one is exact output file and how to convert this output into readable format?

Thanks in advance, Rajkumar.

gildastone commented 10 years ago

I got exactly the same problem and output. I'm running a 64bit MapR distribution with the following config:

java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
$version.string
[1] "R version 3.1.1 (2014-07-10)"

Any help would be great!

ThiDiff.

RajkumarB commented 10 years ago

Hi ThiDiff,

Problem solved. Just pass one more argument "output.format" for "mapreduce" function as

mapreduce (input = small.ints, output.format="text",map = function (k, v) cbind (v, v ^ 2)).

gildastone commented 10 years ago

Hi RajkumarB,

Thank you, it resolves my problem! I have just another question about the from.dfs(...) function. When I try to do

from.dfs(mapreduce(input=to.dfs(1:10), output.format='text', map=function(k,v) cbind(v,v^2)))

I get the following error :

Error in if (file.exists(cmd)) return(cmd) : argument is of length zero

I don't really understand why. This also happends when I type a simple from.dfs(to.dfs(1:10)). Have you any idea?

Thank you in advance!