fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 197 forks source link

Stats functions for other numeric types than float #410

Closed sebhofer closed 5 years ago

sebhofer commented 5 years ago

While thinking about implementation of a describe function (cf. #398) today, I realized that all Stats functions are defined for float only. I don't see any reason why those (in principle) shouldn't exist for other numeric types. I'm guessing that one could just cast all other numeric types to floats and then use the existing functionality. As the result of those functions are floats in most cases, this would not [edited] restrict us in any way.

I'd be willing to give it a shot, but I need some pointers how to achieve this in an efficient and manageable way. Any suggestions (@tpetricek @zyzhu )?

tpetricek commented 5 years ago

There is no good reason for why the stats functions are defined only for float. It's mainly because that is the "default numerical series" in Deedle in some other places. For example df?SomeColumn also defaults to float.

We could add support for other types. I suspect many of the functions might not make sense for integers, but it would be good to support those that do. We could add float32 but that's likely not very useful for anyone. Having decimal would probably make sense though. (Alternatively, we could see if we can actually make those functions generic using static constraints, that might be a good approach too.)

sebhofer commented 5 years ago

I actually think that most of the functions would make sense for integers. It’s just that the result is not an integer. Just think about mean or variance. A sequence of integers certainly does have a mean, and there should be a function to calculate it.

But I really don’t know what would be a good strategy to handle these functions for different types (yet).

Edit: Or did you mean users should just directly create frames from floats instead of ints? I guess that would make sense too. However, if it’s reasonably simple to support ints, I think it would make sense. And decimals would certainly be great too!

zyzhu commented 5 years ago

@tpetricek I've tried overload member functions but has some problems.

For instance, I overloaded mean function

  static member mean (series:Series<'K, float>) =
    let sums = initSumsSparse 1 (valuesAllOpt series)
    sums.sum / sums.nobs

  static member mean (series:Series<'K, int>) =
    let sums = initSumsSparse 1 (valuesAllOpt (series |> Series.mapValues float))
    sums.sum / sums.nobs

But I also have to overload levelMean functions

  static member levelMean (level:'K -> 'L) (series:Series<'K, float>) = 
    Series.applyLevel level (Stats.mean:Series<'K,float>->float) series

  static member levelMean (level:'K -> 'L) (series:Series<'K, int>) = 
    Series.applyLevel level (Stats.mean:Series<'K,int>->float) series

But levelMean gives me error message as 'Method with curried arguments cannot be overloaded. Consider using a method taking tupled arguments'

Any idea of how to get around this problem?

sebhofer commented 5 years ago

By now I'm thinking: Wouldn't it probably make more sense to make the functions generic? This would require rewriting some of the helper functions, but I think it could be done and it looks like the cleaner (and possibly even more efficient) approach to me.

tpetricek commented 5 years ago

@zyzhu Yeah, I think this is the problem - if we want to keep the nice pipe-compatible syntax like ts |> Series.levelMean fst then there is no way to have the function/member overloaded - the overload resolution only works based on the first argument and so I think defining the above overload for levelMean is not possible. What the error message is saying is that we could define it as:

static member levelMean (level:'K -> 'L, series:Series<'K, float>) = 

... but this would no longer work nicely with pipes.

I think there might be a way to make this generic with some inline and static type constraints magic. That sounds good to me, but it will probably make the code a bit more complicated.

zyzhu commented 5 years ago

close issue as it's addressed in https://github.com/fslaborg/Deedle/pull/418