RevolutionAnalytics / rmr2

A package that allows R developer to use Hadoop MapReduce
160 stars 149 forks source link

output is in sequence format #140

Closed RajkumarB closed 10 years ago

RajkumarB commented 10 years ago

After installing Rhadoop using rmr2, rhdfs as suggested, I ran small example as follows.

small.ints = to.dfs (1:10) Warning: $HADOOP_HOME is deprecated.

14/09/10 08:16:17 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/09/10 08:16:17 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 14/09/10 08:16:17 INFO compress.CodecPool: Got brand-new compressor

mapreduce (input = small.ints, map = function (k, v) cbind (v, v ^ 2)) Warning: $HADOOP_HOME is deprecated.

packageJobJar: [/home/mlcoeadmin/rajkumar/hadoop-1.2.1/temp/hadoop-unjar6616713832785091129/] [] /tmp/streamjob8123934586004586291.jar tmpDir=null 14/09/10 08:16:35 INFO mapred.FileInputFormat: Total input paths to process : 1 14/09/10 08:16:36 INFO streaming.StreamJob: getLocalDirs(): [/home/mlcoeadmin/rajkumar/hadoop-1.2.1/temp/mapred/local] 14/09/10 08:16:36 INFO streaming.StreamJob: Running job: job_201409100815_0001 14/09/10 08:16:36 INFO streaming.StreamJob: To kill this job, run: 14/09/10 08:16:36 INFO streaming.StreamJob: /home/mlcoeadmin/rajkumar/hadoop-1.2.1/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201409100815_0001 14/09/10 08:16:36 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201409100815_0001 14/09/10 08:16:37 INFO streaming.StreamJob: map 0% reduce 0% 14/09/10 08:16:50 INFO streaming.StreamJob: map 100% reduce 0% 14/09/10 08:16:55 INFO streaming.StreamJob: map 100% reduce 100% 14/09/10 08:16:55 INFO streaming.StreamJob: Job complete: job_201409100815_0001 14/09/10 08:16:55 INFO streaming.StreamJob: Output: /tmp/file91f7466a911 function () { fname } <environment: 0x29765c0>

It ran nicely. When I checked in the output folder, it contains files as...

hdfs.ls("/tmp/file91f7466a911") permission owner group size modtime 1 -rw-r--r-- mlcoeadmin supergroup 0 2014-09-10 08:16 2 drwxr-xr-x mlcoeadmin supergroup 0 2014-09-10 08:16 3 -rw-r--r-- mlcoeadmin supergroup 122 2014-09-10 08:16 4 -rw-r--r-- mlcoeadmin supergroup 797 2014-09-10 08:16 file 1 /tmp/file91f7466a911/_SUCCESS 2 /tmp/file91f7466a911/_logs 3 /tmp/file91f7466a911/part-00000 4 /tmp/file91f7466a911/part-00001

But the output files are in sequence format. It looks as,

hdfs.cat("/tmp/file91f7466a911/part-00000") [1] "SEQ\006/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\025\024�~\021X5#>٠\023���" hdfs.cat("/tmp/file91f7466a911/part-00001") [1] "SEQ\006/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80>1�d\u0092�:�Cx����\024\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80U�\xc0\x80\xc0\x80\xc0\x80\001\006�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\037\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\005names\a\xc0\x80\xc0\x80\xc0\x80\frmr.template\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\024\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003key\a\xc0\x80\xc0\x80\xc0\x80\003val�\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\001\xc0\x80\xc0\x80\001\a\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\006?�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\b\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\020\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\024\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\030\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\034\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@ \xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\"\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@$\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80?�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\020\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\"\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@0\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@9\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@B\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@H�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@P\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@T@\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@Y\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80" [2] "\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80U�\xc0\x80\xc0\x80\xc0\x80\001\006�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\037\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\005names\a\xc0\x80\xc0\x80\xc0\x80\frmr.template\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\024\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003key\a\xc0\x80\xc0\x80\xc0\x80\003val�\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\001"

Which one is exact output file and how to convert this output into readable format?

piccolbo commented 10 years ago

Hey let's not flood the airwaves! You filed two issues and started on thread on the exact same problem. Pick one and give me a day to respond.

RajkumarB commented 10 years ago

Sorry for that actually by mistake I opened it under "rmr" , after realizing that posted same thing under "rhadoop" to avoid misleading.

piccolbo commented 10 years ago

We decided to retire the RHadoop issue tracker after we created repos for each package, so if you use that one you may get the silence treatment. Google group and issue trackers for each package are all good. Thanks for closing the issues.

On Thu, Sep 11, 2014 at 12:10 AM, RajkumarB notifications@github.com wrote:

Sorry for that actually by mistake I opened it under "rmr" , after realizing that posted same thing under "rhadoop" to avoid misleading.

— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/rmr2/issues/140#issuecomment-55228018 .