fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
935 stars 195 forks source link

GroupRowsBy "Age" works incorrectly work titanic dataset #253

Closed ntr closed 6 years ago

ntr commented 10 years ago

I am using following code for kaggle titanic dataset:

let titanic = Frame.ReadCsv("c:\\tmp\\titanic.csv")
let byAge = titanic.GroupRowsBy<int>("Age")

And it produces some unexpected results - age in rows keys does not match age in rows values:

titanicissue

I suspect this issue takes place because some age values are missing because this code works correctly:

titanic.DropSparseRows().GroupRowsBy<int>("Age")
sebhofer commented 6 years ago

I discovered the same behaviour. MWE:

//expected behaviour
Frame.ofValues [ (1,"foo","a"); (1,"bar","b");  (3,"foo","d"); (3,"bar","e"); (4,"bar","f")]
|> Frame.groupRowsByString "bar"

//unexpected behaviour
Frame.ofValues [ (1,"foo","a"); (1,"bar","b"); (2,"foo","c"); (3,"foo","d"); (3,"bar","e"); (4,"bar","f")]
|> Frame.groupRowsByString "bar"
sebhofer commented 6 years ago

Fixed by #405