fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
937 stars 195 forks source link

`Frame.fillMissing` does not work with `Frame.indexRowsWith` #498

Open hoangdungt2 opened 4 years ago

hoangdungt2 commented 4 years ago

Reproduce bug:

let df = 
    [
       {| Value1 = 1.; Value2 = 2. |}
       {| Value1 = 2.; Value2 = 3. |}
    ] |> Frame.ofRecords
let dfi = Frame.indexRowsWith [0..2] df
dfi.Print()
     Value1    Value2    
0 -> 1         2
1 -> 2         3
2 -> <missing> <missing> 

Fill using 0 (Frame.fillMissingWith 0 dfi).Print() yields no change

     Value1    Value2    
0 -> 1         2
1 -> 2         3
2 -> <missing> <missing>

Interestingly, Frame.fillMisingWith 0.0 dfi yields error

System.InvalidOperationException: Index and vector of a series should have the same length!
   at Deedle.Series`2..ctor(IIndex`1 index, IVector`1 vector, IVectorBuilder vectorBuilder, IIndexBuilder indexBuilder) in C:\FSharp\fslaborg\Deedle\src\Deedle\Series.fs:line 63

Work-around: fill missing by row

dfi |> Frame.mapRowValues (Series.fillMissingWith 0.0) |> Frame.ofRows

yields

     Value1 Value2 
0 -> 1      2
1 -> 2      3
2 -> 0      0

This comes up while I'm trying to implement zipAll that functions similar to this but will take Frame[] as input.

let zipAll (dfs:Frame<_,_>[]) = 
    let outerKeys = dfs |> Array.collect (fun df -> df.RowKeys |> Array.ofSeq) |> Array.distinct
    let dfsNew = 
        dfs
        |> Array.map ( Frame.indexRowsWith outerKeys >> Frame.mapRowValues (Series.FillMissingWith 0.) >> Frame.ofRows)
    Array.fold (Frame.zip (+)) (Array.head dfsNew) (Array.tail dfsNew)
zyzhu commented 4 years ago

Thanks for reporting this. This is indeed a bug. Either we shall limit the number of new row keys, similar to indexColsWith, https://github.com/fslaborg/Deedle/blob/master/src/Deedle/FrameModule.fs#L765 Or we shall convert the expanded rows as optional values.