fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
939 stars 197 forks source link

Typed access to Rows via an automatically implemented interface #281

Closed tpetricek closed 9 years ago

tpetricek commented 9 years ago

Initial implementation that makes it possible to use frame rows in a more type safe way than using just df.Rows (without waiting for type provider features in the next version of F#). This lets the user define an interface (by hand) and then fits the columns of the frame to the members of the interface:

type IStockRow = 
  abstract Open : float 
  abstract High : float 
  abstract Low : float 
  abstract Close : float 

// Given a frame "msft" which has the 4 columns specified by the interface
// we can get the rows as "Series<DateTime, IStockRow>" using "GetRowsAs"
let rows = msft.GetRowsAs<IStockRow>()
rows.[DateTime(2000, 10, 10)].Close

Here are some things & questions to do before checking this in:

hmansell commented 9 years ago

Looks great to me. Is it efficient? I see you emit code, but do we end up avoiding a lot of boxing or other inefficient stuff?

tpetricek commented 9 years ago

The current version is more of a proof of concept and it does not avoid unboxing. It basically generates something like:

member x.Foo : float = 
  underlyingVector.GetObject(<index>) 
  |> Convert.convertType<float> ConversionKind.Flexible

Now that I think more about this, it should be possible to generate more clever code that would avoid boxing & unboxing and would be faster. Thanks for the suggestion - I'll look into that and update the PR.

tpetricek commented 9 years ago

Two updates here:

As for the performance, I added the following to performance tests. First, using typed:

let rows = titanic.GetRowsAs<ITitanicRow>()
let mutable c = 0
for i in fst rows.KeyRange .. snd rows.KeyRange do
  c <- c + rows.[i].Pclass 

And the same thing using untyped API:

let rows = titanic.Rows
let mutable c = 0
for i in fst rows.KeyRange .. snd rows.KeyRange do
  c <- c + titanic.Rows.[i].GetAs<int>("Pclass")

The typed version takes ~0.4ms and the untyped one ~40ms :-)

hmansell commented 9 years ago

Sweet!

adamklein commented 9 years ago

Wow, that's a great win!

adamklein commented 9 years ago

Looks great! Please merge at your convenience!

tpetricek commented 9 years ago

Thanks - I got rid of the forgotten comments & I'm merging this.