RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

Unserialize error and serialize results are not consistent with the problem? #243

Closed meteorwen closed 6 years ago

meteorwen commented 6 years ago

Why unserialize will prompt the following error after the hdfs.read function is executed: Error in unserialize(n1) : unknown input format

iris.csv on hdfs :

hdfs:/// user / dsg / iris.csv

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa
5.4,3.7,1.5,0.2,setosa
4.8,3.4,1.6,0.2,setosa
4.8,3,1.4,0.1,setosa
4.3,3,1.1,0.1,setosa
5.8,4,1.2,0.2,setosa
5.7,4.4,1.5,0.4,setosa
5.4,3.9,1.3,0.4,setosa
5.1,3.5,1.4,0.3,setosa
5.7,3.8,1.7,0.3,setosa
5.1,3.8,1.5,0.3,setosa
5.4,3.4,1.7,0.2,setosa
5.1,3.7,1.5,0.4,setosa
4.6,3.6,1,0.2,setosa
5.1,3.3,1.7,0.5,setosa
4.8,3.4,1.9,0.2,setosa
5,3,1.6,0.2,setosa

my code:

> p1 <- "hdfs:///user/dsg/iris.csv"
> p2 <- "hdfs:///user/dsg/res.csv"
> require(rJava);
>   require(dplyr);
>   require(magrittr);
>   Sys.setenv(HADOOP_CMD="/opt/cloudera/parcels/CDH-5.8.5-1.cdh5.8.5.p0.5/bin/hadoop");
>   Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH-5.8.5-1.cdh5.8.5.p0.5/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.5.jar");
>   require(rhdfs);
>   hdfs.init();
>   path1 <- hdfs.file(p1, "r");
>   n1 <- hdfs.read(path1)
> n1
   [1] 53 65 70 61 6c 2e 4c 65 6e 67 74 68 2c 53 65 70 61 6c 2e 57 69 64 74 68 2c 50 65 74 61 6c
  [31] 2e 4c 65 6e 67 74 68 2c 50 65 74 61 6c 2e 57 69 64 74 68 2c 53 70 65 63 69 65 73 0a 35 2e
  [61] 31 2c 33 2e 35 2c 31 2e 34 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 39 2c 33 2c 31 2e 34
  [91] 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 37 2c 33 2e 32 2c 31 2e 33 2c 30 2e 32 2c 73 65
 [121] 74 6f 73 61 0a 34 2e 36 2c 33 2e 31 2c 31 2e 35 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2c
 [151] 33 2e 36 2c 31 2e 34 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2e 34 2c 33 2e 39 2c 31 2e 37
 [181] 2c 30 2e 34 2c 73 65 74 6f 73 61 0a 34 2e 36 2c 33 2e 34 2c 31 2e 34 2c 30 2e 33 2c 73 65
 [211] 74 6f 73 61 0a 35 2c 33 2e 34 2c 31 2e 35 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 34 2c
 [241] 32 2e 39 2c 31 2e 34 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 39 2c 33 2e 31 2c 31 2e 35
 [271] 2c 30 2e 31 2c 73 65 74 6f 73 61 0a 35 2e 34 2c 33 2e 37 2c 31 2e 35 2c 30 2e 32 2c 73 65
 [301] 74 6f 73 61 0a 34 2e 38 2c 33 2e 34 2c 31 2e 36 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e
 [331] 38 2c 33 2c 31 2e 34 2c 30 2e 31 2c 73 65 74 6f 73 61 0a 34 2e 33 2c 33 2c 31 2e 31 2c 30
 [361] 2e 31 2c 73 65 74 6f 73 61 0a 35 2e 38 2c 34 2c 31 2e 32 2c 30 2e 32 2c 73 65 74 6f 73 61
 [391] 0a 35 2e 37 2c 34 2e 34 2c 31 2e 35 2c 30 2e 34 2c 73 65 74 6f 73 61 0a 35 2e 34 2c 33 2e
 [421] 39 2c 31 2e 33 2c 30 2e 34 2c 73 65 74 6f 73 61 0a 35 2e 31 2c 33 2e 35 2c 31 2e 34 2c 30
 [451] 2e 33 2c 73 65 74 6f 73 61 0a 35 2e 37 2c 33 2e 38 2c 31 2e 37 2c 30 2e 33 2c 73 65 74 6f
 [481] 73 61 0a 35 2e 31 2c 33 2e 38 2c 31 2e 35 2c 30 2e 33 2c 73 65 74 6f 73 61 0a 35 2e 34 2c
 [511] 33 2e 34 2c 31 2e 37 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2e 31 2c 33 2e 37 2c 31 2e 35
 [541] 2c 30 2e 34 2c 73 65 74 6f 73 61 0a 34 2e 36 2c 33 2e 36 2c 31 2c 30 2e 32 2c 73 65 74 6f
 [571] 73 61 0a 35 2e 31 2c 33 2e 33 2c 31 2e 37 2c 30 2e 35 2c 73 65 74 6f 73 61 0a 34 2e 38 2c
 [601] 33 2e 34 2c 31 2e 39 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2c 33 2c 31 2e 36 2c 30 2e 32
 [631] 2c 73 65 74 6f 73 61 0a 35 2c 33 2e 34 2c 31 2e 36 2c 30 2e 34 2c 73 65 74 6f 73 61 0a 35
 [661] 2e 32 2c 33 2e 35 2c 31 2e 35 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2e 32 2c 33 2e 34 2c
 [691] 31 2e 34 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 37 2c 33 2e 32 2c 31 2e 36 2c 30 2e 32
 [721] 2c 73 65 74 6f 73 61 0a 34 2e 38 2c 33 2e 31 2c 31 2e 36 2c 30 2e 32 2c 73 65 74 6f 73 61
 [751] 0a 35 2e 34 2c 33 2e 34 2c 31 2e 35 2c 30 2e 34 2c 73 65 74 6f 73 61 0a 35 2e 32 2c 34 2e
 [781] 31 2c 31 2e 35 2c 30 2e 31 2c 73 65 74 6f 73 61 0a 35 2e 35 2c 34 2e 32 2c 31 2e 34 2c 30
 [811] 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 39 2c 33 2e 31 2c 31 2e 35 2c 30 2e 32 2c 73 65 74 6f
 [841] 73 61 0a 35 2c 33 2e 32 2c 31 2e 32 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2e 35 2c 33 2e
 [871] 35 2c 31 2e 33 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 34 2e 39 2c 33 2e 36 2c 31 2e 34 2c 30
 [901] 2e 31 2c 73 65 74 6f 73 61 0a 34 2e 34 2c 33 2c 31 2e 33 2c 30 2e 32 2c 73 65 74 6f 73 61
 [931] 0a 35 2e 31 2c 33 2e 34 2c 31 2e 35 2c 30 2e 32 2c 73 65 74 6f 73 61 0a 35 2c 33 2e 35 2c
 [961] 31 2e 33 2c 30 2e 33 2c 73 65 74 6f 73 61 0a 34 2e 35 2c 32 2e 33 2c 31 2e 33 2c 30 2e 33
 [991] 2c 73 65 74 6f 73 61 0a 34 2e
 [ reached getOption("max.print") -- omitted 2716 entries ]

> n2 <- unserialize(n1)
Error in unserialize(n1) : unknown input format

serialize(iris,NULL)
   [1] 58 0a 00 00 00 02 00 03 04 02 00 02 03 00 00 00 03 13 00 00 00 05 00 00 00 0e 00 00 00 96
  [31] 40 14 66 66 66 66 66 66 40 13 99 99 99 99 99 9a 40 12 cc cc cc cc cc cd 40 12 66 66 66 66
  [61] 66 66 40 14 00 00 00 00 00 00 40 15 99 99 99 99 99 9a 40 12 66 66 66 66 66 66 40 14 00 00
  [91] 00 00 00 00 40 11 99 99 99 99 99 9a 40 13 99 99 99 99 99 9a 40 15 99 99 99 99 99 9a 40 13
 [121] 33 33 33 33 33 33 40 13 33 33 33 33 33 33 40 11 33 33 33 33 33 33 40 17 33 33 33 33 33 33
 [151] 40 16 cc cc cc cc cc cd 40 15 99 99 99 99 99 9a 40 14 66 66 66 66 66 66 40 16 cc cc cc cc
 [181] cc cd 40 14 66 66 66 66 66 66 40 15 99 99 99 99 99 9a 40 14 66 66 66 66 66 66 40 12 66 66
 [211] 66 66 66 66 40 14 66 66 66 66 66 66 40 13 33 33 33 33 33 33 40 14 00 00 00 00 00 00 40 14
 [241] 00 00 00 00 00 00 40 14 cc cc cc cc cc cd 40 14 cc cc cc cc cc cd 40 12 cc cc cc cc cc cd
 [271] 40 13 33 33 33 33 33 33 40 15 99 99 99 99 99 9a 40 14 cc cc cc cc cc cd 40 16 00 00 00 00
 [301] 00 00 40 13 99 99 99 99 99 9a 40 14 00 00 00 00 00 00 40 16 00 00 00 00 00 00 40 13 99 99
 [331] 99 99 99 9a 40 11 99 99 99 99 99 9a 40 14 66 66 66 66 66 66 40 14 00 00 00 00 00 00 40 12
 [361] 00 00 00 00 00 00 40 11 99 99 99 99 99 9a 40 14 00 00 00 00 00 00 40 14 66 66 66 66 66 66
 [391] 40 13 33 33 33 33 33 33 40 14 66 66 66 66 66 66 40 12 66 66 66 66 66 66 40 15 33 33 33 33
 [421] 33 33 40 14 00 00 00 00 00 00 40 1c 00 00 00 00 00 00 40 19 99 99 99 99 99 9a 40 1b 99 99
 [451] 99 99 99 9a 40 16 00 00 00 00 00 00 40 1a 00 00 00 00 00 00 40 16 cc cc cc cc cc cd 40 19
 [481] 33 33 33 33 33 33 40 13 99 99 99 99 99 9a 40 1a 66 66 66 66 66 66 40 14 cc cc cc cc cc cd
 [511] 40 14 00 00 00 00 00 00 40 17 99 99 99 99 99 9a 40 18 00 00 00 00 00 00 40 18 66 66 66 66
 [541] 66 66 40 16 66 66 66 66 66 66 40 1a cc cc cc cc cc cd 40 16 66 66 66 66 66 66 40 17 33 33
 [571] 33 33 33 33 40 18 cc cc cc cc cc cd 40 16 66 66 66 66 66 66 40 17 99 99 99 99 99 9a 40 18
 [601] 66 66 66 66 66 66 40 19 33 33 33 33 33 33 40 18 66 66 66 66 66 66 40 19 99 99 99 99 99 9a
 [631] 40 1a 66 66 66 66 66 66 40 1b 33 33 33 33 33 33 40 1a cc cc cc cc cc cd 40 18 00 00 00 00
 [661] 00 00 40 16 cc cc cc cc cc cd 40 16 00 00 00 00 00 00 40 16 00 00 00 00 00 00 40 17 33 33
 [691] 33 33 33 33 40 18 00 00 00 00 00 00 40 15 99 99 99 99 99 9a 40 18 00 00 00 00 00 00 40 1a
 [721] cc cc cc cc cc cd 40 19 33 33 33 33 33 33 40 16 66 66 66 66 66 66 40 16 00 00 00 00 00 00
 [751] 40 16 00 00 00 00 00 00 40 18 66 66 66 66 66 66 40 17 33 33 33 33 33 33 40 14 00 00 00 00
 [781] 00 00 40 16 66 66 66 66 66 66 40 16 cc cc cc cc cc cd 40 16 cc cc cc cc cc cd 40 18 cc cc
 [811] cc cc cc cd 40 14 66 66 66 66 66 66 40 16 cc cc cc cc cc cd 40 19 33 33 33 33 33 33 40 17
 [841] 33 33 33 33 33 33 40 1c 66 66 66 66 66 66 40 19 33 33 33 33 33 33 40 1a 00 00 00 00 00 00
 [871] 40 1e 66 66 66 66 66 66 40 13 99 99 99 99 99 9a 40 1d 33 33 33 33 33 33 40 1a cc cc cc cc
 [901] cc cd 40 1c cc cc cc cc cc cd 40 1a 00 00 00 00 00 00 40 19 99 99 99 99 99 9a 40 1b 33 33
 [931] 33 33 33 33 40 16 cc cc cc cc cc cd 40 17 33 33 33 33 33 33 40 19 99 99 99 99 99 9a 40 1a
 [961] 00 00 00 00 00 00 40 1e cc cc cc cc cc cd 40 1e cc cc cc cc cc cd 40 18 00 00 00 00 00 00
 [991] 40 1b 99 99 99 99 99 9a 40 16

# Both use iris data, so why are the serialize results different? [1] 53 65 70 61 6c 2e 4c 65 6e 67 74 68 2c 53 65 70 61 [1] 58 0a 00 00 00 02 00 03 04 02 00 02 03 00 00

n2 <- unserialize(n1) # Error in unserialize(n1) : unknown input format