florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
212 stars 22 forks source link

Implement support for extended families and REs in mgcv #11

Closed florianhartig closed 2 years ago

florianhartig commented 7 years ago

fittedModel$family$simulate() will work for standard families, but

This means DHARMa can currently not simulate from any of the extended families, and only from the top random level.

Perspective for this request: In general, I am reluctant to implement simulate functions in DHARMa, for reasons explained in https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA However, once a simulate function would be available in mgcv, it would be very easy to integrate all it's capabilities in DHARMa. Feel free to request a simulate function from the package developers.

Interim solution for users that want to use DHARMa with glmmTMB: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run. The vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa

florianhartig commented 6 years ago

extended simulate functions in https://github.com/mfasiolo/mgcViz

dill commented 4 years ago

Just wanted to add a note here that also the Tweedie and negbin appear to not be supported (as well as the tw and nb as mentioned above).

Trying the example for Tweedie/tw:

library(mgcv)
set.seed(3)
n <- 400
dat <- gamSim(1,n=n,dist="poisson",scale=.2)
dat$y <- rTweedie(exp(dat$f),p=1.3,phi=.5) 
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=tw(),
               data=dat,method="REML")

as expected:

> simulateResiduals(b2)
Error in checkModel(fittedModel) : 
  It seems you are trying to fit a model from mgcv that was fit with an extended
.family. Simulation functions for these families are not yet implemented in DHAR
Ma. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates abou
t this

But now:

b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=Tweedie(1.25,power(.1)),
            data=dat,method="REML")

yields:

> simulateResiduals(b1)
Error in simulate.lm(object, nsim = nsim, ...) : 
  family 'Tweedie(1.25)' not implemented

But Tweedie is not an extended.family object:

> class(b1$family)
[1] "family"
> class(b2$family)
[1] "extended.family" "family"         

The same is true for the examples in ?negbin:

library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
dat$y <- rnbinom(g,size=3,mu=g)

b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)

We get:

> simulateResiduals(b)
Error in checkModel(fittedModel) : 
  It seems you are trying to fit a model from mgcv that was fit with an extended
.family. Simulation functions for these families are not yet implemented in DHAR
Ma. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates abou
t this

and

> simulateResiduals(b0)
Error in simulate.lm(object, nsim = nsim, ...) : 
  family 'Negative Binomial(3)' not implemented

Apologies if this is a duplicate of a bug elsewhere, I didn't see it when searching. Using DHARMa installed from github (version 0.3.2.0).

florianhartig commented 4 years ago

Hi David, I assume you were looking at this because of my ISEC presentation, where I said wrt to a question that Tweedie is supported?

When I said this, I meant that it is in principle supported from the DHARMa side (I recently fixed some issue regarding mixed distributions), but of course your regression package must still be able to simulate a fitted model with the tweedie, and if that is not the case you have to simulate by hand.

For mgcv, this is apparently not the case. The issue here is really on the mgcv side, so as long a Simon doesn't implement the simulate() function for this distributions, there is little I can do about this for gam. I think he's aware of the issue, but it's not very high on his list. I should have a look at Matteo's work with mgcViz though, maybe this solves the problem.

You are right though, the help / error messages should be clarified!

florianhartig commented 4 years ago

p.s. - you can fit tweedie with glmmTMB, but of course, not with a gam.

dill commented 4 years ago

Thanks for the quick reponse! Yes, I was in the ISEC presentation (which was great, thank you!).

Totally understand the issue re: lack of a simulate method, just wanted to make sure that others who tried this out would find the issue and know what to do. The error message for nb/tw is great if that can be used for the other options (with modified text around the extended family part) that would make things clear I think.

florianhartig commented 4 years ago

yes, that is what I will try to do - thanks!

mfasiolo commented 4 years ago

Hi both, I wonder whether the best solution would be to create a package called, say, gamUtils which provides all the things that we would like to have but are currently missing in mgcv, such as simulators for all the available families. Then, Dharma and mgcViz could depend on it.

simulate.gam() in mgcViz simulates responses using either the family$rd() or the family$qf() functions but does not know what to do if those are missing. An extra package could provide those missing things, while avoiding having duplicated code in mgcViz, DHarma, etc.

florianhartig commented 4 years ago

Sure, why not. Of course, this only makes sense if there is no chance that Simon will implement this directly in mgcv (which would be preferable imo)

dill commented 4 years ago

I'd be happy to contribute to this, if I can be helpful @mfasiolo. Perhaps this work in the simulate package is a good place to start?

mfasiolo commented 4 years ago

That would be great @dill! I was thinking of doing this in the context of the gamFactory package, but maybe a separate "gamUtils" (say) package would be preferable.

@florianhartig I agree that mgcv would be the best place to provide such functions, but note that potentially there are a lot of family-specific utilities that could be useful for model checking (simulation, quantile function, CDF, expected values, ...). Simon has enough work to do with GAM fitting methods, hence providing such utilities outside mgcv seems a more realistic option at the moment.

dill commented 4 years ago

Okay I've created a gamUtils repo to hold further discussion on this so this doesn't overwhelm @florianhartig's mailbox (but please join the discussion there if you're interested 😄).

gavinsimpson commented 4 years ago

Just to jump in here; I have a simulate.gam() method in gratia that will work as per simulate() in base R for "gam" class objects (there's methods for "gamm" and "scam" too) if mgcv's fix.family.rd() can add the random number generator to the family or it is already inside the family objects of the model as family(model)$rd.

florianhartig commented 4 years ago

Hi guys, this is an excellent. I'll be happy to lean back and include any solution to simulate from mgcv that works :)

@gavinsimpson , one question - in your package, it says "Simulate from the posterior distribution", so are you simulating with parameter uncertainty? This is interesting, but it would be useful to be able to switch this off, to conform to all the other packages.

florianhartig commented 3 years ago

Hi all,

I realise that this is an old thread but I have finally managed to work on this and may have found an acceptable solution for the moment, which is, with a few workarounds, to optionally make DHARMa depend on mgcViz.

I have seen you have worked on https://github.com/dill/gamUtils, and I'm happy to switch to this, but for DHARMa I am very hesitant to depend on something that is not on CRAN.

@gavinsimpson, thanks for your suggestion, I have also looked at gratia, but from what I understood, the simulations are done with parameter uncertainties, i.e. something like posterior predictive simulations, which would break the logic of DHARMa, as all other simulations are done from the MLE. I also think that, for quantile residual checks, simulations from the MLE are preferable, as the logic here is to test if a point estimate (and not an entire interval) can be rejected by the data.

florianhartig commented 3 years ago

p.s. in case someone wants to check this out the branch / PR for the inclusion of mgcViz is here https://github.com/florianhartig/DHARMa/pull/308

dill commented 3 years ago

this is great @florianhartig. gamUtils has taken a back seat due to other time commitments for me at the moment, but I'm very happy for any solution to this that enables people to use DHARMa.

gavinsimpson commented 3 years ago

@gavinsimpson, thanks for your suggestion, I have also looked at gratia, but from what I understood, the simulations are done with parameter uncertainties, i.e. something like posterior predictive simulations, which would break the logic of DHARMa, as all other simulations are done from the MLE. I also think that, for quantile residual checks, simulations from the MLE are preferable, as the logic here is to test if a point estimate (and not an entire interval) can be rejected by the data.

@florianhartig This may actually a bug in the documentation. gratia:::simulate.gam (and co) simulates new values of the response conditional on the estimated model. It's basically a predict.gam() to get the expected value of the response given the model followed by a call to the corresponding rXXXX() for the assumed conditional distribution of Y (rnorm() for Gaussian etc) where these exist.

However, ?simulate.gam talks about sampling from posteriors and it's wrong; it's a hangover from when simulate.gam was used to generate posterior draws for individual smooths. I realized this was not compatible with the intention of things in base R like simulate.lm etc, and so I changed simulate.gam to work like the base R implementations for LM, GLMs etc. Updating the documentation seems not to have happened when I did this however.

Things like fitted_samples(), predicted_samples() do take draws from the posterior distribution of expected values and the response respectively using a Gaussian approximation to the posterior (and hopefully soon via a Metropolis Hasting sampler via mgcv::gam.mh())

gavinsimpson commented 3 years ago

@florianhartig Sorry I missed your question about simulate.gam and posterior sampling from a year ago - I could have clarified that then (and fixed the documentation sooner).

florianhartig commented 2 years ago

Should be done with https://github.com/florianhartig/DHARMa/pull/308, I leave this open for a final check!

dfifield commented 2 years ago

Do I understand correctly that tw() and nb() should be working now at 0.4.5?

I stole the nb() and tw() examples that @dill gave above and tried them out with 0.4.5 :

set.seed(3)
n <- 400
dat <- gamSim(1,n=n,dist="poisson",scale=.2)
dat$y <- rTweedie(exp(dat$f),p=1.3,phi=.5) 
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=tw(), data=dat,method="REML")
simulateResiduals(b2)

yields

Error in checkModel(fittedModel) : 
  It seems you are trying to fit a model from mgcv that was fit with an extended.family. Simulation functions for these 
families are not yet implemented in DHARMa. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates 
about this

Likewise, for:

dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
dat$y <- rnbinom(g,size=3,mu=g)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
simulateResiduals(b)

We get

Error in checkModel(fittedModel) : 
  It seems you are trying to fit a model from mgcv that was fit with an extended.family. Simulation functions for these 
families are not yet implemented in DHARMa. See issue https://github.com/florianhartig/DHARMa/issues/11 for updates 
about this

However,

b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
simulateResiduals(b0)

Works and we get:

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Registered S3 method overwritten by 'mgcViz':
  method from  
  +.gg   GGally
Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 

Scaled residual values: 0.6321292 0.7855354 0.6275612 0.6452824 0.1096674 0.8805173 0.8338119 0.8256728 0.5613169 0.5300937 0.98 0.2534163 0.8323018 0.3091923 0.7534282 0.1913759 0.7070032 0.3500909 0.9570273 0.6956188 ...

Here's the output from sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DHARMa_0.4.5 mgcv_1.8-37  nlme_3.1-152

loaded via a namespace (and not attached):
 [1] minqa_1.2.4     MASS_7.3-54     compiler_4.1.1  Matrix_1.3-4    tools_4.1.1    
 [6] Rcpp_1.0.7      splines_4.1.1   grid_4.1.1      nloptr_1.2.2.2  boot_1.3-28    
[11] lme4_1.1-27.1   lattice_0.20-44

Thanks - and apologies if this isn't supposed to be fixed yet!

florianhartig commented 2 years ago

OK, sorry, I still had the check that throws the error in the code, try again

florianhartig commented 2 years ago

I haven't checked all possible options, but everything that simulates with mgcViz should work now!

dfifield commented 2 years ago

Yes, it's working now.

Thanks so much!

florianhartig commented 2 years ago

OK, I think this is solved, will go to CRAN in DHARMa 0.4.5. Thanks for your comments!

florianhartig commented 2 years ago

As a side note, @gavinsimpson @mfasiolo - if I understand Gavin's remarks correctly, both gratia and mgcViz implement the same functionality regarding simulate, so maybe an option to pull both things together long-term? For the moment, I am using mgcViz::simulate, but happy to switch to whatever is the suggested long-term solution to simulate from an mgcv::gam object.

And thanks for your clarification @gavinsimpson

mfasiolo commented 2 years ago

Hi Florian, I think that it would be nice to move simulate.gam to the mgcvUtils package. The problem is finding the time to work on this kind of "tidying up" activities! Lately, I barely manage to fix bugs and do very minor changes to mgcViz and qgam.

dfifield commented 2 years ago

Thanks again Florian - that great!

Sent from my Bell Samsung device over Canada's largest network.

-------- Original message -------- From: Florian Hartig @.> Date: 2021-12-02 10:02 a.m. (GMT-03:30) To: florianhartig/DHARMa @.> Cc: Dave Fifield @.>, Comment @.> Subject: Re: [florianhartig/DHARMa] Implement support for extended families and REs in mgcv (#11)

OK, I think this is solved, will go to CRAN in DHARMa 0.4.5. Thanks for your comments!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflorianhartig%2FDHARMa%2Fissues%2F11%23issuecomment-984631087&data=04%7C01%7CDave.Fifield%40ec.gc.ca%7C5dc578f3fec246ddfbbb08d9b598394b%7C740c5fd36e8b41769cc9454dbe4e62c4%7C0%7C0%7C637740487664366428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=e9E8dm%2Ba7HJAimUVv3%2BbYX4nrt1ngBbT3%2Fi5TebhnSc%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAPJOH2JQJCHGXKZIET7OCDUO5YPXANCNFSM4CZHFMBA&data=04%7C01%7CDave.Fifield%40ec.gc.ca%7C5dc578f3fec246ddfbbb08d9b598394b%7C740c5fd36e8b41769cc9454dbe4e62c4%7C0%7C0%7C637740487664376384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Er1H0NQFkyn4KFbCEuR7FQl2NZH%2BQ9v%2F1xIaF3sGcWg%3D&reserved=0.

florianhartig commented 2 years ago

Great! @mfasiolo - I feel you ;)