ddf-project / ddf-flink

DDF with Flink Implementation
Apache License 2.0
3 stars 10 forks source link

multi_factor doesn't work #56

Open PangZhi opened 9 years ago

PangZhi commented 9 years ago

I created the ddf using airline.csv. then compute factor on the cancellationcode. Then got the following error msg: Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at io.ddf.flink.content.RepresentationHandler$$anonfun$io$ddf$flink$content$RepresentationHandler$$parseRow$1.apply(RepresentationHandler.scala:81) at io.ddf.flink.content.RepresentationHandler$$anonfun$io$ddf$flink$content$RepresentationHandler$$parseRow$1.apply(RepresentationHandler.scala:79) at scala.collection.immutable.List.foreach(List.scala:318) at io.ddf.flink.content.RepresentationHandler$.io$ddf$flink$content$RepresentationHandler$$parseRow(RepresentationHandler.scala:79) at io.ddf.flink.content.RepresentationHandler$$anonfun$1.apply(RepresentationHandler.scala:64) at io.ddf.flink.content.RepresentationHandler$$anonfun$1.apply(RepresentationHandler.scala:64) at org.apache.flink.api.scala.DataSet$$anon$1.map(DataSet.scala:292) at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78) at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:177) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745)

The airline.csv is: 2008,1,3,4,2003,1955,2211,2225,WN,335,N712SW,128,150,116,-14,8,IAD,TPA,810,4,8,0,,0,NA,NA,NA,NA,NA 2009,1,3,4,754,735,1002,1000,WN,3231,N772SW,128,145,113,2,19,IAD,TPA,810,5,10,0,,0,NA,NA,NA,NA,NA 2010,3,3,4,628,620,804,750,WN,448,N428WN,96,90,76,14,8,IND,BWI,515,3,17,0,,0,NA,NA,NA,NA,NA 2008,2,3,4,926,930,1054,1100,WN,1746,N612SW,88,90,78,-6,-4,IND,BWI,515,3,7,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1829,1755,1959,1925,WN,3920,N464WN,90,90,77,34,34,IND,BWI,515,3,10,0,,0,2,0,0,0,32 2008,3,3,4,1940,1915,2121,2110,WN,378,N726SW,101,115,87,11,25,IND,JAX,688,4,10,0,,0,NA,NA,NA,NA,NA 2008,11,3,4,1937,1830,2037,1940,WN,509,N763SW,240,250,230,57,67,IND,LAS,1591,3,7,0,,0,10,0,0,0,47 2008,10,3,4,1039,1040,1132,1150,WN,535,N428WN,233,250,219,-18,-1,IND,LAS,1591,7,7,0,,0,NA,NA,NA,NA,NA 2008,9,3,4,617,615,652,650,WN,11,N689SW,95,95,70,2,2,IND,MCI,451,6,19,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1620,1620,1639,1655,WN,810,N648SW,79,95,70,-16,0,IND,MCI,451,3,6,0,,0,NA,NA,NA,NA,NA 2008,6,3,4,706,700,916,915,WN,100,N690SW,130,135,106,1,6,IND,MCO,828,5,19,0,,0,NA,NA,NA,NA,NA 2008,4,3,4,1644,1510,1845,1725,WN,1333,N334SW,121,135,107,80,94,IND,MCO,828,6,8,0,,0,8,0,0,0,72 2008,2,3,4,1426,1430,1426,1425,WN,829,N476WN,60,55,39,1,-4,IND,MDW,162,9,12,0,,0,NA,NA,NA,NA,NA 2008,3,3,4,715,715,720,710,WN,1016,N765SW,65,55,37,10,0,IND,MDW,162,7,21,0,,0,NA,NA,NA,NA,NA 2008,6,3,4,1702,1700,1651,1655,WN,1827,N420WN,49,55,35,-4,2,IND,MDW,162,4,10,0,,0,NA,NA,NA,NA,NA 2008,7,3,4,1029,1020,1021,1010,WN,2272,N263WN,52,50,37,11,9,IND,MDW,162,6,9,0,,0,NA,NA,NA,NA,NA 2008,8,3,4,1452,1425,1640,1625,WN,675,N286WN,228,240,213,15,27,IND,PHX,1489,7,8,0,,0,3,0,0,0,12 2008,1,3,4,754,745,940,955,WN,1144,N778SW,226,250,205,-15,9,IND,PHX,1489,5,16,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1323,1255,1526,1510,WN,4,N674AA,123,135,110,16,28,IND,TPA,838,4,9,0,,0,0,0,0,0,16 2008,1,3,4,1416,1325,1512,1435,WN,54,N643SW,56,70,49,37,51,ISP,BWI,220,2,5,0,,0,12,0,0,0,25 2008,1,3,4,706,705,807,810,WN,68,N497WN,61,65,51,-3,1,ISP,BWI,220,3,7,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1657,1625,1754,1735,WN,623,N724SW,57,70,47,19,32,ISP,BWI,220,5,5,0,,0,7,0,0,0,12 2008,1,3,4,1900,1840,1956,1950,WN,717,N786SW,56,70,49,6,20,ISP,BWI,220,2,5,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1039,1030,1133,1140,WN,1244,N714CB,54,70,47,-7,9,ISP,BWI,220,2,5,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,801,800,902,910,WN,2101,N222WN,61,70,53,-8,1,ISP,BWI,220,3,5,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1520,1455,1619,1605,WN,2553,N394SW,59,70,50,14,25,ISP,BWI,220,2,7,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,1422,1255,1657,1610,WN,188,N215WN,155,195,143,47,87,ISP,FLL,1093,6,6,0,,0,40,0,0,0,7 2008,1,3,4,1954,1925,2239,2235,WN,1754,N243WN,165,190,155,4,29,ISP,FLL,1093,3,7,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,636,635,921,945,WN,2275,N454WN,165,190,147,-24,1,ISP,FLL,1093,5,13,0,,0,NA,NA,NA,NA,NA 2008,5,3,4,734,730,958,1020,WN,550,N712SW,324,350,314,-22,4,ISP,LAS,2283,2,8,0,,0,NA,NA,NA,NA,NA 2008,1,3,4,2107,1945,2334,2230,WN,362,N798SW,147,165,134,64,82,ISP,MCO,972,6,7,0,,0,5,0,0,0,59

Shiti commented 9 years ago

In the file that we are using, CancellationCode is column no 23 which has no values. What is the column Number for CancellationCode in the table that you are creating?

@PangZhi is this similar to what you tried?

it should "compute factor" in {
    val airlineDDF = loadAirlineDDF()
    airlineDDF.setAsFactor("CancellationCode")
    println(airlineDDF.getSchema.getColumn("CancellationCode").getOptionalFactor.getLevels)
  }
PangZhi commented 9 years ago

@Shiti Yes. It's just this column. We have some situations that there will be no value. We have to handle it as null or NA.