fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
929 stars 196 forks source link

Frame.ofRowsOrdinal should only iterate the sequence ONCE #356

Closed kflu closed 6 years ago

kflu commented 7 years ago

Consider the following code:

#r @"packages\Deedle\lib\net40\Deedle.dll"
open Deedle

seq {
    printfn "%A" "run 1"
    yield series [ "col1" => 111; "col2" => 222; "col3" => 333 ]
    printfn "%A" "run 2"
    yield series [ "col1" => 111; "col2" => 222; "col3" => 333 ]
}
|> Frame.ofRowsOrdinal 

The output is

"run 1"
"run 2"
"run 1"
"run 2"

This means Frame.ofRowsOrdinal iterates the sequence TWICE. This is a big issue when the implementation of the sequence involves some resources at may only be able to be iterated once, for example, in my case, an IDataReader instance from a database connection. More specifically, an exception is thrown from my IDataReader instance that complains the reader is already closed. This is due to the two iterations on it, and after the first iteration, the reader is already closed.

One related issue is that when calling Deedle.ReadReader on my IDataReader instance, it threw the exactly the same exception as mentioned above. But this should be fixed if Deedle caches the sequence and don't iterate on it twice.

Yes, I can work this around by caching my sequence into an array before feeding it to Frame.ofRowsOrdinal. But in general, the common practice of working with seq and IEnumerable is that the function that takes the sequence (the callee) should take care of caching if it is to be iterates multiple times.

zyzhu commented 6 years ago

fixed in https://github.com/fslaborg/Deedle/pull/396