fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

[Feature Request] Add quantile function for normal distribution (inverse CDF) #132

Closed kMutagene closed 1 year ago

kMutagene commented 3 years ago

Is your feature request related to a problem? Please describe.

Q-Q Plots are a very handy exploratory plot when analyzing data. Currently, there is no way to construct these plots with FSharp.Stats functions only, as inverse CDF (quantile function) is missing.

To construct a (normal) Q-Q plot, first the empirical Quantiles are calculated by one of these methods for the ranked observations in the sample of interest:

image

Subsequently, each quantile for a ranked observation is used as input for the inverse CDF of a normal distribution (e.g. N(0,1)). These values are plotted against each other:

image

note that for the inverse CDF of the normal distribution, we need the inverse erf which is currently not there as well.

Describe the solution you'd like

Addition of:

Describe alternatives you've considered

None. but here is a workaround to make it work as long as there are no built in functions:

Here is an inverse erf approximation i took from here:

let erfInv (f:float) =
    let s = sign f |> float
    let x = (1. - f) * (1. + f)
    let lnx = log x
    let tt1 = 2./(System.Math.PI * 0.147) + 0.5 * lnx
    let tt2 = 1./(0.147) * lnx

    (s * sqrt(-tt1 + sqrt(tt1*tt1 - tt2)))

Inverse CDF base don that:

let inverseCDFNorm mean sigma p = mean + ( sigma * (erfInv (2.* p - 1.) * (sqrt 2.)))

Code for above plot using rankit for a sample of 1000 observations from a normal distribution (should fall on a straight line on the plot):

#r "nuget: Plotly.NET, 2.0.0-beta9"
#r "nuget: FSharp.Stats, 0.4.0"

open Plotly.NET
open FSharp.Stats

let di = Distributions.Continuous.normal 100. 20.

let sample = List.init 1000 (fun x -> di.Sample())

let quants = 
    sample
    |> List.sort
    |> fun l ->
        l
        |> List.mapi (fun i x ->
            let r = float i + 1.
            (r - 0.5) / ( float l.Length)
        )

let norm =
    quants 
    |> List.map (inverseCDFNorm 0. 1.)

Distributions.Continuous.Normal.CDF 0. 1. 1.644669874

Chart.Point(
    norm,
    sample
    |> List.sort
)
|> Chart.withX_AxisStyle "theoretical quantiles"
|> Chart.withY_AxisStyle "observations"
|> Chart.Show
bvenn commented 1 year ago

closed by #238