fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
937 stars 195 forks source link

Deedle throws System.OutOfMemoryException when reading large csv file #332

Closed denskh closed 5 years ago

denskh commented 8 years ago

Frame.ReadCsv method throws System.OutOfMemoryException for files about 200MB or larger on 32 bit FSI. Sample code to generate test file to reproduce issue:

//generate test data set 
let file = "c:/temp/test.csv"
System.IO.File.WriteAllLines(file, [for i in 0..3000000 -> "20160114,ABC,acc12345,entity llc,Joe Doe,default,port1,FWD,ABC.TO,CAD"])
// reproduce issue 
let df = Frame.ReadCsv(file, 
            hasHeaders = false, 
            schema =  "report_date,source_system,account,legal_entity,trader,strategy,portfolio,security_type,security,currency")
tpetricek commented 8 years ago

In general, it is probably a good idea to use 64bit version of fsiAnyCpu.exe when you work with more data. That said, I had a brief look at this and did some improvements - can you test with a build from #334?

denskh commented 8 years ago

It definitely works better, I can load files that had failed before! I will play more with tomorrow.

Update: used it a lot today in both 32bit and 64bit FSI, no issues whatsoever. Build 334 can process larger files, but 64bit is the way to go. ReadCsv file load time seems to be identical to nuget build.