fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

[Feature Request] Inverse CDF of distributions #262

Open ghost opened 1 year ago

ghost commented 1 year ago

Is your feature request related to a problem? Please describe. Inverse CDFs are useful for calculating credible intervals for a given distribution, among other things.

Describe the solution you'd like It would be great to have inverse CDFs for all distributions. But starting from normal distribution would be great.

bvenn commented 1 year ago

The normal InvCDF for mean = 0 and sigma = 1 is already implemented at an inproper position:

https://github.com/fslaborg/FSharp.Stats/blob/b74ecf29651b32f66e328958931cdb2d2dd8dc0f/src/FSharp.Stats/Signal/QQPlot.fs#L91-L92

I agree, we should add quantile functions as InvCDF for all distributions :+1:

bvenn commented 1 year ago

I've added an InvCDF member to all distributions by 3d6a2201c45d792d95cc127c97c377fa4bcf496c. I noticed the approximation of the inverse error function leads to some discrepancies when extreme values are chosen and compared to the R qnorm procedure.

// Testing FSharp.Stats
(Distributions.Continuous.Normal.InvCDF 0. 1. 0.5)  //0.
# Testing R
qnorm(0.5,0,1)
# Testing Python
from scipy.stats import norm
norm.ppf(0.5, loc=0, scale=1)
Mean StDev X result FSharp.Stats result R result Python
0 1 0.5 1.253321755e-09 0 0
0 1 0 -infinity -infinity -infinity
0 1 1 infinity infinity infinity
3 0.01 0.01 2.97673652985179 2.97673652126 2.97673652125959
-300000 5000 0.99 -288368.2649258 -288368.2606298 -288368.2606297958

While the deviation is small and just occurs at extreme values, it would be worth checking if the approximation presented in Wichura, “Algorithm AS 241: The Percentage Points of the Normal Distribution.”, 1988 should be implemented.

References

bvenn commented 1 year ago

I've implemented the quantile function of the normal distribution as described in Wichura et al.. Its accurate for 15 decimal places.

bvenn commented 1 year ago

Progress of adding InvCDF/quantile functions t continuous distributions

bvenn commented 11 months ago

Many distributions have no closed form of the quantile function. Besides published approximations would be beneficial to add a member for each Distribution that approximates the correct x for a given p. The CDF is continuously increasing and therefore a root finding approach should work just fine. I propose the following:

type MyDistribution =

    static member PDF a b x = ...

    static member CDF a b x = ...

    static member InvCDF a b x = //possible no closed form exists

    static member InvCDFApprox a b x accuracy = 
        ///parameters: function (float -> float); accuracy (float); minimum (float); maximum (float); maxIterations
        let tmp = Optimization.Bisection.tryFindRoot (fun x -> MyDistribution.CDF a b x - p) accuracy 0. 1. 1000
        match tmp with 
        | Some x -> x
        | None -> failwith "no InvCDF found to satisfy the given conditions"

Drawbacks

To discuss: