fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

[BUG] Beta distribution PDF returns nan unexpectedly #258

Closed DoganCK closed 1 year ago

DoganCK commented 1 year ago

Describe the bug Beta distribution PDF returns nan after a threshold when it should not.

To Reproduce Steps to reproduce the behavior:

#r "nuget: FSharp.Stats"
#r "nuget: MathNet.Numerics"

open FSharp.Stats.Distributions.ContinuousDistribution
open MathNet.Numerics.Distributions

(beta 100. 100.).PDF 0.5     // F#Stats   11.2696958
Beta(100., 100.).Density 0.5 // MathNet   11.2696958

(beta 600. 600.).PDF 0.5     // F#Stats   nan
Beta(600., 600.).Density 0.5 // MathNet   27.63377432

Expected behavior Similar results with MathNet. They resort to LogPDF when either one of the params is greater than 80.

OS and framework information (please complete the following information):

bvenn commented 1 year ago

This happens because of floating point errors. The PDF is defined as:

PDF(x) = (x * (alpha - 1.0)) ((1.0 - x) ** (beta - 1.0)) / (SpecialFunctions.Beta._beta alpha beta)

The first factor: 0.5599 already returns 0.0. The maximal exponent to get a valid result is 0.5248 resulting in 1.976262583e-323.

Either the variables must be converted from float to decimal to facilitate the calculation with such extreme values, or there is an alternative algorithm to determine the PDF without the need for these extreme values.

(Wikipedia PDF)

https://github.com/mathnet/mathnet-numerics/blob/bc0bd6e4b33b8f58e99ff854755b4a3220881a84/src/Numerics/Distributions/Beta.cs#L559

https://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm

bvenn commented 1 year ago

I've transformed the PDF to its corresponding PDFLn and updated the density estimation with a switch for high alpha/beta parameters. The function is tested for alpha/beta <= 2000 but should handle values way above this.

(beta 600. 600.).PDF 0.5     // F#Stats   27.63377432