Closed sebhofer closed 5 years ago
There is no good reason for why the stats functions are defined only for float
. It's mainly because that is the "default numerical series" in Deedle in some other places. For example df?SomeColumn
also defaults to float
.
We could add support for other types. I suspect many of the functions might not make sense for integers, but it would be good to support those that do. We could add float32
but that's likely not very useful for anyone. Having decimal
would probably make sense though. (Alternatively, we could see if we can actually make those functions generic using static constraints, that might be a good approach too.)
I actually think that most of the functions would make sense for integers. It’s just that the result is not an integer. Just think about mean or variance. A sequence of integers certainly does have a mean, and there should be a function to calculate it.
But I really don’t know what would be a good strategy to handle these functions for different types (yet).
Edit: Or did you mean users should just directly create frames from floats instead of ints? I guess that would make sense too. However, if it’s reasonably simple to support ints, I think it would make sense. And decimals would certainly be great too!
@tpetricek I've tried overload member functions but has some problems.
For instance, I overloaded mean function
static member mean (series:Series<'K, float>) =
let sums = initSumsSparse 1 (valuesAllOpt series)
sums.sum / sums.nobs
static member mean (series:Series<'K, int>) =
let sums = initSumsSparse 1 (valuesAllOpt (series |> Series.mapValues float))
sums.sum / sums.nobs
But I also have to overload levelMean functions
static member levelMean (level:'K -> 'L) (series:Series<'K, float>) =
Series.applyLevel level (Stats.mean:Series<'K,float>->float) series
static member levelMean (level:'K -> 'L) (series:Series<'K, int>) =
Series.applyLevel level (Stats.mean:Series<'K,int>->float) series
But levelMean gives me error message as 'Method with curried arguments cannot be overloaded. Consider using a method taking tupled arguments'
Any idea of how to get around this problem?
By now I'm thinking: Wouldn't it probably make more sense to make the functions generic? This would require rewriting some of the helper functions, but I think it could be done and it looks like the cleaner (and possibly even more efficient) approach to me.
@zyzhu Yeah, I think this is the problem - if we want to keep the nice pipe-compatible syntax like ts |> Series.levelMean fst
then there is no way to have the function/member overloaded - the overload resolution only works based on the first argument and so I think defining the above overload for levelMean
is not possible. What the error message is saying is that we could define it as:
static member levelMean (level:'K -> 'L, series:Series<'K, float>) =
... but this would no longer work nicely with pipes.
I think there might be a way to make this generic with some inline
and static type constraints magic. That sounds good to me, but it will probably make the code a bit more complicated.
close issue as it's addressed in https://github.com/fslaborg/Deedle/pull/418
While thinking about implementation of a
describe
function (cf. #398) today, I realized that allStats
functions are defined forfloat
only. I don't see any reason why those (in principle) shouldn't exist for other numeric types. I'm guessing that one could just cast all other numeric types to floats and then use the existing functionality. As the result of those functions are floats in most cases, this would not [edited] restrict us in any way.I'd be willing to give it a shot, but I need some pointers how to achieve this in an efficient and manageable way. Any suggestions (@tpetricek @zyzhu )?