Closed florianhartig closed 2 years ago
extended simulate functions in https://github.com/mfasiolo/mgcViz
Just wanted to add a note here that also the Tweedie
and negbin
appear to not be supported (as well as the tw
and nb
as mentioned above).
Trying the example for Tweedie
/tw
:
library(mgcv)
set.seed(3)
n <- 400
dat <- gamSim(1,n=n,dist="poisson",scale=.2)
dat$y <- rTweedie(exp(dat$f),p=1.3,phi=.5)
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=tw(),
data=dat,method="REML")
as expected:
> simulateResiduals(b2)
Error in checkModel(fittedModel) :
It seems you are trying to fit a model from mgcv that was fit with an extended
.family. Simulation functions for these families are not yet implemented in DHAR
Ma. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates abou
t this
But now:
b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=Tweedie(1.25,power(.1)),
data=dat,method="REML")
yields:
> simulateResiduals(b1)
Error in simulate.lm(object, nsim = nsim, ...) :
family 'Tweedie(1.25)' not implemented
But Tweedie
is not an extended.family
object:
> class(b1$family)
[1] "family"
> class(b2$family)
[1] "extended.family" "family"
The same is true for the examples in ?negbin
:
library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
dat$y <- rnbinom(g,size=3,mu=g)
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
We get:
> simulateResiduals(b)
Error in checkModel(fittedModel) :
It seems you are trying to fit a model from mgcv that was fit with an extended
.family. Simulation functions for these families are not yet implemented in DHAR
Ma. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates abou
t this
and
> simulateResiduals(b0)
Error in simulate.lm(object, nsim = nsim, ...) :
family 'Negative Binomial(3)' not implemented
Apologies if this is a duplicate of a bug elsewhere, I didn't see it when searching. Using DHARMa
installed from github (version 0.3.2.0).
Hi David, I assume you were looking at this because of my ISEC presentation, where I said wrt to a question that Tweedie is supported?
When I said this, I meant that it is in principle supported from the DHARMa side (I recently fixed some issue regarding mixed distributions), but of course your regression package must still be able to simulate a fitted model with the tweedie, and if that is not the case you have to simulate by hand.
For mgcv, this is apparently not the case. The issue here is really on the mgcv side, so as long a Simon doesn't implement the simulate() function for this distributions, there is little I can do about this for gam. I think he's aware of the issue, but it's not very high on his list. I should have a look at Matteo's work with mgcViz though, maybe this solves the problem.
You are right though, the help / error messages should be clarified!
p.s. - you can fit tweedie with glmmTMB, but of course, not with a gam.
Thanks for the quick reponse! Yes, I was in the ISEC presentation (which was great, thank you!).
Totally understand the issue re: lack of a simulate
method, just wanted to make sure that others who tried this out would find the issue and know what to do. The error message for nb
/tw
is great if that can be used for the other options (with modified text around the extended family part) that would make things clear I think.
yes, that is what I will try to do - thanks!
Hi both, I wonder whether the best solution would be to create a package called, say, gamUtils which provides all the things that we would like to have but are currently missing in mgcv, such as simulators for all the available families. Then, Dharma and mgcViz could depend on it.
simulate.gam() in mgcViz simulates responses using either the family$rd() or the family$qf() functions but does not know what to do if those are missing. An extra package could provide those missing things, while avoiding having duplicated code in mgcViz, DHarma, etc.
Sure, why not. Of course, this only makes sense if there is no chance that Simon will implement this directly in mgcv (which would be preferable imo)
I'd be happy to contribute to this, if I can be helpful @mfasiolo. Perhaps this work in the simulate
package is a good place to start?
That would be great @dill! I was thinking of doing this in the context of the gamFactory package, but maybe a separate "gamUtils" (say) package would be preferable.
@florianhartig I agree that mgcv would be the best place to provide such functions, but note that potentially there are a lot of family-specific utilities that could be useful for model checking (simulation, quantile function, CDF, expected values, ...). Simon has enough work to do with GAM fitting methods, hence providing such utilities outside mgcv seems a more realistic option at the moment.
Okay I've created a gamUtils
repo to hold further discussion on this so this doesn't overwhelm @florianhartig's mailbox (but please join the discussion there if you're interested 😄).
Just to jump in here; I have a simulate.gam()
method in gratia that will work as per simulate()
in base R for "gam"
class objects (there's methods for "gamm"
and "scam"
too) if mgcv's fix.family.rd()
can add the random number generator to the family or it is already inside the family
objects of the model as family(model)$rd
.
Hi guys, this is an excellent. I'll be happy to lean back and include any solution to simulate from mgcv that works :)
@gavinsimpson , one question - in your package, it says "Simulate from the posterior distribution", so are you simulating with parameter uncertainty? This is interesting, but it would be useful to be able to switch this off, to conform to all the other packages.
Hi all,
I realise that this is an old thread but I have finally managed to work on this and may have found an acceptable solution for the moment, which is, with a few workarounds, to optionally make DHARMa depend on mgcViz.
I have seen you have worked on https://github.com/dill/gamUtils, and I'm happy to switch to this, but for DHARMa I am very hesitant to depend on something that is not on CRAN.
@gavinsimpson, thanks for your suggestion, I have also looked at gratia, but from what I understood, the simulations are done with parameter uncertainties, i.e. something like posterior predictive simulations, which would break the logic of DHARMa, as all other simulations are done from the MLE. I also think that, for quantile residual checks, simulations from the MLE are preferable, as the logic here is to test if a point estimate (and not an entire interval) can be rejected by the data.
p.s. in case someone wants to check this out the branch / PR for the inclusion of mgcViz is here https://github.com/florianhartig/DHARMa/pull/308
this is great @florianhartig. gamUtils
has taken a back seat due to other time commitments for me at the moment, but I'm very happy for any solution to this that enables people to use DHARMa
.
@gavinsimpson, thanks for your suggestion, I have also looked at gratia, but from what I understood, the simulations are done with parameter uncertainties, i.e. something like posterior predictive simulations, which would break the logic of DHARMa, as all other simulations are done from the MLE. I also think that, for quantile residual checks, simulations from the MLE are preferable, as the logic here is to test if a point estimate (and not an entire interval) can be rejected by the data.
@florianhartig This may actually a bug in the documentation. gratia:::simulate.gam
(and co) simulates new values of the response conditional on the estimated model. It's basically a predict.gam()
to get the expected value of the response given the model followed by a call to the corresponding rXXXX()
for the assumed conditional distribution of Y (rnorm()
for Gaussian etc) where these exist.
However, ?simulate.gam
talks about sampling from posteriors and it's wrong; it's a hangover from when simulate.gam
was used to generate posterior draws for individual smooths. I realized this was not compatible with the intention of things in base R like simulate.lm
etc, and so I changed simulate.gam
to work like the base R implementations for LM, GLMs etc. Updating the documentation seems not to have happened when I did this however.
Things like fitted_samples()
, predicted_samples()
do take draws from the posterior distribution of expected values and the response respectively using a Gaussian approximation to the posterior (and hopefully soon via a Metropolis Hasting sampler via mgcv::gam.mh()
)
@florianhartig Sorry I missed your question about simulate.gam
and posterior sampling from a year ago - I could have clarified that then (and fixed the documentation sooner).
Should be done with https://github.com/florianhartig/DHARMa/pull/308, I leave this open for a final check!
Do I understand correctly that tw() and nb() should be working now at 0.4.5?
I stole the nb() and tw() examples that @dill gave above and tried them out with 0.4.5 :
set.seed(3)
n <- 400
dat <- gamSim(1,n=n,dist="poisson",scale=.2)
dat$y <- rTweedie(exp(dat$f),p=1.3,phi=.5)
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=tw(), data=dat,method="REML")
simulateResiduals(b2)
yields
Error in checkModel(fittedModel) :
It seems you are trying to fit a model from mgcv that was fit with an extended.family. Simulation functions for these
families are not yet implemented in DHARMa. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates
about this
Likewise, for:
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
dat$y <- rnbinom(g,size=3,mu=g)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
simulateResiduals(b)
We get
Error in checkModel(fittedModel) :
It seems you are trying to fit a model from mgcv that was fit with an extended.family. Simulation functions for these
families are not yet implemented in DHARMa. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates
about this
However,
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
simulateResiduals(b0)
Works and we get:
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Registered S3 method overwritten by 'mgcViz':
method from
+.gg GGally
Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
Scaled residual values: 0.6321292 0.7855354 0.6275612 0.6452824 0.1096674 0.8805173 0.8338119 0.8256728 0.5613169 0.5300937 0.98 0.2534163 0.8323018 0.3091923 0.7534282 0.1913759 0.7070032 0.3500909 0.9570273 0.6956188 ...
Here's the output from sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DHARMa_0.4.5 mgcv_1.8-37 nlme_3.1-152
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-54 compiler_4.1.1 Matrix_1.3-4 tools_4.1.1
[6] Rcpp_1.0.7 splines_4.1.1 grid_4.1.1 nloptr_1.2.2.2 boot_1.3-28
[11] lme4_1.1-27.1 lattice_0.20-44
Thanks - and apologies if this isn't supposed to be fixed yet!
OK, sorry, I still had the check that throws the error in the code, try again
I haven't checked all possible options, but everything that simulates with mgcViz should work now!
Yes, it's working now.
Thanks so much!
OK, I think this is solved, will go to CRAN in DHARMa 0.4.5. Thanks for your comments!
As a side note, @gavinsimpson @mfasiolo - if I understand Gavin's remarks correctly, both gratia and mgcViz implement the same functionality regarding simulate, so maybe an option to pull both things together long-term? For the moment, I am using mgcViz::simulate, but happy to switch to whatever is the suggested long-term solution to simulate from an mgcv::gam object.
And thanks for your clarification @gavinsimpson
Hi Florian, I think that it would be nice to move simulate.gam to the mgcvUtils package. The problem is finding the time to work on this kind of "tidying up" activities! Lately, I barely manage to fix bugs and do very minor changes to mgcViz and qgam.
Thanks again Florian - that great!
Sent from my Bell Samsung device over Canada's largest network.
-------- Original message -------- From: Florian Hartig @.> Date: 2021-12-02 10:02 a.m. (GMT-03:30) To: florianhartig/DHARMa @.> Cc: Dave Fifield @.>, Comment @.> Subject: Re: [florianhartig/DHARMa] Implement support for extended families and REs in mgcv (#11)
OK, I think this is solved, will go to CRAN in DHARMa 0.4.5. Thanks for your comments!
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflorianhartig%2FDHARMa%2Fissues%2F11%23issuecomment-984631087&data=04%7C01%7CDave.Fifield%40ec.gc.ca%7C5dc578f3fec246ddfbbb08d9b598394b%7C740c5fd36e8b41769cc9454dbe4e62c4%7C0%7C0%7C637740487664366428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=e9E8dm%2Ba7HJAimUVv3%2BbYX4nrt1ngBbT3%2Fi5TebhnSc%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAPJOH2JQJCHGXKZIET7OCDUO5YPXANCNFSM4CZHFMBA&data=04%7C01%7CDave.Fifield%40ec.gc.ca%7C5dc578f3fec246ddfbbb08d9b598394b%7C740c5fd36e8b41769cc9454dbe4e62c4%7C0%7C0%7C637740487664376384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Er1H0NQFkyn4KFbCEuR7FQl2NZH%2BQ9v%2F1xIaF3sGcWg%3D&reserved=0.
Great! @mfasiolo - I feel you ;)
fittedModel$family$simulate() will work for standard families, but
This means DHARMa can currently not simulate from any of the extended families, and only from the top random level.
Perspective for this request: In general, I am reluctant to implement simulate functions in DHARMa, for reasons explained in https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA However, once a simulate function would be available in mgcv, it would be very easy to integrate all it's capabilities in DHARMa. Feel free to request a simulate function from the package developers.
Interim solution for users that want to use DHARMa with glmmTMB: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run. The vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa