gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

Add cdf, inverse cdf, and rng functions for all families handled by mgcv #183

Open gavinsimpson opened 2 years ago

gavinsimpson commented 2 years ago

Ideally this would go in the {mgcvUtils} package and ultimately I should contribute this there, but until then, I'm going to gather functionality here and then figure out how to get it into {mgcvUtils} and then get that package on CRAN.

{mgcv} uses the family object of the model to store relevant functions for distributions implemented in the package. There are mgcv::fix.family.rd() and mgcv::fix.family.qf() which add the random deviate generator and quantile function (inverse of the CDF) for a small subset of families, typically the ones R supplies.

If {mgcv}'s family or the relevant fix.family.xx function doesn't cover the family for your model you are just out of luck.

Ideally we want the equivalent of the p, q, and r functions for all the families implemented in {mgcv}. We want the p and q functions for all distributions so we can compute randomised residuals, the q functions are used for QQ or worm plot diagnostics to provide reference bands, and r would be useful for generating new values of the response given the estimated parameters of the model.

I'm prepared to use other packages here to facilitate this. the {tweedie} package seems the easiest way to add p/q/rtweedie() (note {mgcv} has rTweedie() only), and the {distributional} and {distributions3} packages have most of the other p/q/r functions we need, although with different interfaces than the p/q/r paradigm. Ideally we'd only use one of {distributional} or {distributions3}.

The table below lists the distributions/families in {mgcv} and tracks progress on providing all the required p/q/r functions for them. Individual distributions will be handled in separate (linked) issues as there will likely need to be discussion to clarify the parameterisation Simon has used for the extra families in {mgcv}, which aren't always straightforward (or as clearly documented as they could be - "pot, meet kettle" [I'm not exactly good at documentation either, so no shade intended there])

Key:

Family CDF pfoo() Quantile qfoo() RNG rfoo() Issue
binomial() :interrobang: :heavy_check_mark: :heavy_check_mark:
gaussian() :interrobang: :heavy_check_mark: :heavy_check_mark:
Gamma() :interrobang: :heavy_check_mark: :heavy_check_mark:
inverse.gaussian() :interrobang: :interrobang: :heavy_check_mark:
poisson() :interrobang: :heavy_check_mark: :heavy_check_mark:
ocat() :interrobang: :interrobang: :heavy_check_mark:
tw() :interrobang: :interrobang: :heavy_check_mark:
Tweedie() :interrobang: :interrobang: :heavy_check_mark:
twlss() :interrobang: :interrobang: :interrobang:
negbin() :interrobang: :interrobang: :heavy_check_mark:
nb() :interrobang: :interrobang: :heavy_check_mark:
betar() :interrobang: :heavy_check_mark: :heavy_check_mark:
scat() :interrobang: :interrobang: :heavy_check_mark:
ziP() :interrobang: :interrobang: :heavy_check_mark:
cox.ph() :interrobang: :interrobang: :interrobang:
gammals() :interrobang: :interrobang: :heavy_check_mark:
gaulss() :interrobang: :interrobang: :heavy_check_mark:
gevlss() :interrobang: :interrobang: :heavy_check_mark:
gumbls() :interrobang: :interrobang: :heavy_check_mark:
shash() :interrobang: :heavy_check_mark: :heavy_check_mark:
ziplss() :interrobang: :interrobang: :heavy_check_mark:
mvn() :interrobang: :interrobang: :heavy_check_mark:
multinom() :interrobang: :interrobang: :heavy_check_mark: