fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 197 forks source link

add quantile #428

Closed alexpantyukhin closed 5 years ago

tpetricek commented 5 years ago

This looks great to me! 👍

zyzhu commented 5 years ago

@alexpantyukhin It looks great. Three comments.

  1. Make the function inline to support integer and decimal values. The following lines would do.

    static member inline quantile (quantiles:float[], series:Series<'K, 'V>) =
    let vals = series.Values |> Seq.sort |> Seq.toArray
    let valsLength = vals |> Array.length
    
    quantiles
    |> Array.map(fun q ->
      let floatIndex = q * float(valsLength - 1)
      if floatIndex % 1.0 = 0.0 then
        string q, float vals.[int(floatIndex)]
      else
        let index = int(floatIndex)
        let l = float vals.[index]
        let r = float vals.[index + 1]
    
        string q, l + (r - l) * (floatIndex % 1.0)
    )
    |> Series.ofObservations
  2. Math.Net supports 9 different ways of quantile calculation from R project. https://numerics.mathdotnet.com/descriptivestatistics.html The one you implemented is essentially R7, the Excel version. You can be explicit in the comments of the function saying it's the Excel version of quantile, equivalent to QuantileDefinition.R7 from Math.Net.

In addition, you could add another unit test in the end of stats.fs to use FsCheck and Math.Net quantilecustom function to confirm your number has the same output as Math.Net.

  1. We could add wrapper in the future Deedle.Math to support all other quantilecustom functions from Math.Net. You implementation here is good enough for quick descriptive study.

Thanks a lot.

alexpantyukhin commented 5 years ago

Thank you @zyzhu for your notes! About the first point: do you mean the decimal and integer for series values? Actually I forgot to make a check there that all values in quantiles:float[] should be <= 1.0 .

zyzhu commented 5 years ago

@alexpantyukhin Yes. I meant series could have integer or decimal values. Quantile on that series should still work and output float value. Just like all the other inline stats functions we refactored recently.

You are right about the second point. The parameter values should be checked between 0.0 and 1.0

zyzhu commented 5 years ago

I saw the test didn't pass Math.Net test. Maybe it's worth checking the nuance of Math.Net implementation https://github.com/mathnet/mathnet-numerics/blob/8ba423a8a6f011bb079f736497526ad44c88aad3/src/Numerics/Statistics/SortedArrayStatistics.Single.cs#L299

alexpantyukhin commented 5 years ago

Yes, I saw it. Interesting that it didn't pass only for .netcore. I will check that.

zyzhu commented 5 years ago

Looks great. Maybe it's time to add quantile info to describe function. Thanks!

alexpantyukhin commented 5 years ago

Still need to check tests :) It passed well even in previos time.

zyzhu commented 5 years ago

Travis has passed test. It looks like the status here is stuck. I'll merge it.

alexpantyukhin commented 5 years ago

Cool, thanks! I guess it’s time to add it into describe method. I would like to make describe method according the pandas way. For different types different collection of keys.

alexpantyukhin commented 5 years ago

I will do it.