fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
812 stars 287 forks source link

type construction very slow for CSV Files with large number of variables e.g. 1000+ #1207

Open fwaris opened 6 years ago

fwaris commented 6 years ago

For a CSV file with 1000+ vars, the system takes a while (several minutes) to get the type defined (even with a small number of rows in the file)

Also the memory consumed approaches 8G.

I routinely encounter files with 100's and sometimes 1000+ columns. Not sure if FSharp.Data can be optimized for reading wide files faster but hopefully someone can shed some light. I don't have time now to dig into it right now.

Faisal

fwaris commented 6 years ago

here is an anonymized version of the CSV sample file that seems to cause this issue.

data_sample.zip

Sample code:

open FSharp.Data

[<Literal>]
let data_file = @"data_sample.csv"

type Tdata = CsvProvider< data_file >

let tdata = Tdata.GetSample()
tdata.Headers.Value.Length
dsyme commented 6 years ago

If you have profiling tools you might want to profile the execution of fsc.exe compiling this file, and show the top inclusive and top exclusive methods?