Closed carsten-j closed 10 years ago
As an added note, compiling and runnig the following is quick
open FSharp.Data
[<EntryPoint>]
let main argv =
let data = CsvFile.Load("trainSmall.csv")
for row in data.Rows do
printfn "%s, %s" (row.GetColumn "pixel99") (row.GetColumn "pixel783")
0
wherein the following takes quite some time to compile, but runs quickly
open FSharp.Data
type trainingSet = CsvProvider<"C:/projektit/FsharpDataPerformance/FsharpDataPerformance/trainSmall.csv", ",", CacheRows=false>
[<EntryPoint>]
let main argv =
let data = trainingSet.Load("trainSmall.csv")
for row in data.Rows do
printfn "%i, %i" row.pixel99 row.pixel783
0
Maybe something to do with reflection. If I have time, I'll try to profile this.
Some quick screen captures of slow cases. The fast case of CsvFile.Load
was rather uneventful, as expected.
It's expected that CsvProvider takes longer to compile, as it's reading the csv values and inferring the column types, while CsvFile is untyped. But it should be just a bit slower, not too much
Ok, this is a pathological case for CsvProvider. It only has 114 rows, but has 785 columns!!! This means we will have tuple with 785 elements. Most Csv's don't have this many columns, and honestly you won't get much value out using CsvProvider with this file, as this is basically a matrix serialized in csv format, all the types are the same. In any case we can probably optimize this, it's taking a lot of time generating the types, which is uncommon. Compare this csv: WIth the measurements made in #514:
Reduced from 17s to 12s
Down to 7s
I improved the time it takes to do the first line. As for the second one, I recommend you to use CsvFile for this case instead
When I read the attached CSV file which contains 785 columns and 113 rows (including header row) then the following two lines of code executes really slow:
When I sent the first line to the F# interactive it returns in about 10 seconds whereas when I sent the second line of code to the F# interactive it takes more than 5 minutes before the interactive prompt replies.
I am running the code on my MacBook Pro from 2013 with a 2.6 GHz I5 processor and 16GB ram using F# 3.0 and Xamarin Studio. I have tried the same experiment with Windows7 / VS2013 running under a VM on the same hardware. The results are comparable. When I use the same machine and try to do the exact same thing with R it is so fast that I cannot time it with an ordinary watch.
https://dl.dropboxusercontent.com/u/13678102/Script.fsx https://dl.dropboxusercontent.com/u/13678102/trainSmall.csv