gamlss-dev / gamlss.dist

gamlss.dist: Distributions for Generalized Additive Models for Location Scale and Shape
https://CRAN.R-project.org/package=gamlss.dist
4 stars 3 forks source link

GB2 distribution with strange quantiles for extreme parameter values #3

Open stefan-1997 opened 4 months ago

stefan-1997 commented 4 months ago

I fitted a GB2 distribution to my income data and then got puzzled when examining the parameter estimates. I observed that for parameters similar to c(809.6687, 116.4727, 0.0342, 0.0278), the quantile values returned by qGB2 are somewhat strange. For instance, the 10% Percentile is still 0 which does not make sense to me, in particular when having a look at the associated density function. Also, using the random number generator rGB2 many 0 values are drawn from the distribution.

I compared the results to the ones by the GB2 R package. Here, the quantile values look more reasonable to me for the very same set of GB2 parameters. For standard GB2 parameters, the two packages return identical results but not for some extreme parameter values. Why is that and might this be a potential bug in the gamlss.dist package?

Here, my example code:

##### GB2 Distribution #####

rm(list = ls())

set.seed(79)

library(GB2)
library(gamlss.dist)

params = c(809.6687, 116.4727, 0.0342, 0.0278)
# params = c(1, 1, 1, 0.5)
mu = params[1]
sigma = params[2]
nu = params[3]
tau = params[4]

GB2::qgb2(c(0.1, 0.5, 0.9), shape1 = sigma, scale = mu, shape2 = nu, shape3 = tau)
gamlss.dist::qGB2(c(0.1, 0.5, 0.9), mu = mu, sigma = sigma, nu = nu, tau = tau)

GB2::pgb2(c(10, 100, 1000, 10000), shape1 = sigma, scale = mu, shape2 = nu, shape3 = tau)
gamlss.dist::pGB2(c(10, 100, 1000, 10000), mu = mu, sigma = sigma, nu = nu, tau = tau)

GB2::dgb2(c(10, 100, 1000, 10000), shape1 = sigma, scale = mu, shape2 = nu, shape3 = tau)
gamlss.dist::dGB2(c(10, 100, 1000, 10000), mu = mu, sigma = sigma, nu = nu, tau = tau)

r1 = GB2::rgb2(100000, shape1 = sigma, scale = mu, shape2 = nu, shape3 = tau)
summary(r1)
sum(r1 == 0)
r2 = gamlss.dist::rGB2(100000, mu = mu, sigma = sigma, nu = nu, tau = tau)
summary(r2)
sum(r2 == 0)
zeileis commented 4 months ago

Thanks, Stefan @stefan-1997, for raising this point, very useful.

The differences come from GB2 using the d/p/q/r functions for the beta distribution, i.e., qbeta() and friends, while gamlss.dist relies on the F distribution, i.e., qf() and friends. I simplified your quantile example to:

GB2::qgb2(0.1, 20, 20, 0.03, 0.02)
## [1] 1.981105
gamlss.dist::qGB2(0.1, 20, 20, 0.03, 0.02)
## [1] 0

The reason for this is that qbeta() is still stable while qf() is not:

qbeta(0.1, 0.03, 0.02)
## [1] 8.270811e-21
qf(0.1, 2 * 0.03, 2 * 0.02)
## [1] 0

Similar problems seem to occur with the corresponding "p" and "d" function where "beta" works but "f" does not in certain cases.

Whether this advantage of beta over F only applies to this case or more generally, I cannot say. But it might be worth having a closer look. Mikis @mstasinopoulos and Bob @rigbyr did you consider this when implementing GB2 in gamlss.dist?

Finally, the random number generator gamlss.dist::rGB2() uses the inversion method which is problematic for many distributions. Here, it inherits the pathologies above from qGB2(). In contrast, GB2::rgb2() directly uses rbeta() which, again, seems to be numerically more stable.