fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
929 stars 196 forks source link

Frame.ReadCsv with inferred types silently ignores small/scientific values #337

Open marklam opened 8 years ago

marklam commented 8 years ago

If types are inferred from a CSV file with no small (scientific-format) values in the rows considered for the infer, then any cell with a scientific format entry is marked as a missing value. Take the following CSV (d:\temp\scientific.csv)

#,X,Y
Happy,7.058365954,5.754336636
Grumpy,6.62148607,9.51E-05

And this code to read Grumpy's Y value (the inferTypes=true case is limited to 1 line to make the required sample CSV shorter, but imagine it's not limited but the 'bad' value is on line 500)

open Deedle

let (p : float) = Frame.ReadCsv(@"D:\temp\scientific.csv", hasHeaders=true, inferTypes=false)
                  |> Frame.indexRowsString("#")
                  |> Frame.getCol "Y"
                  |> Series.get "Grumpy"
printfn "Without inferTypes : %f" p

let (q : float) = Frame.ReadCsv(@"D:\temp\scientific.csv", hasHeaders=true, inferTypes=true, inferRows = 1)
                  |> Frame.indexRowsString("#")
                  |> Frame.getCol "Y"
                  |> Series.get "Grumpy"
printfn "With inferTypes : %f" q

You'll get the following output:

Without inferTypes : 0.000095
Deedle.MissingValueException: Value at the key Grumpy is missing
   at Deedle.Series`2.Get(K key) in c:\Tomas\Public\bmc\Deedle\src\Deedle\Series.fs:line 311
>    at Deedle.SeriesModule.Get[K,T](K key, Series`2 series) in c:\Tomas\Public\bmc\Deedle\src\Deedle\SeriesModule.fs:line 275
   at <StartupCode$FSI_0002>.$FSI_0002.main@()