fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 197 forks source link

ReadCsv cannot set empty string as missingValues #441

Closed zyzhu closed 4 years ago

zyzhu commented 5 years ago

Repro steps sample.csv, c3 column is all empty

row,c1,c2,c3
1,,5,
2,4,6,

The following lines treat `` as missing value. It then access the row by key.

[<Literal>]
let sample = "C:/FSharp/sample.csv"
let r = Frame.ReadCsv(sample,missingValues=[|""|]).IndexRows<int>("row")
r.Rows.[2].As<float>()

Expected outcome

val r : Frame<int,string> =

     c1        c2 c3        
1 -> <missing> 5  <missing> 
2 -> 4         6  <missing> 

val it : Series<string,float> =

c1 -> 4         
c2 -> 6         
c3 -> <missing> 

Actual outcome

val r : Frame<int,string> =
     c1        c2 c3 
1 -> <missing> 5      
2 -> 4         6     

System.FormatException: Input string was not in a correct format.

Suggestion Empty string cannot be set as missing values because of the following line. c3 column is inferred to be string even though it's set as one of the missingValues. https://github.com/fsharp/FSharp.Data/blob/master/src/Csv/CsvInference.fs#L130

Wait till FSharp.Data address the following issue https://github.com/fsharp/FSharp.Data/issues/1192