dill / mgcvUtils

Various mgcv-related GAM utilities
Other
10 stars 0 forks source link

`simulate.gam` function for use with `DHARMa` #1

Open dill opened 4 years ago

dill commented 4 years ago

As reported in DHARMa issue 11 "fittedModel$family$simulate() will work for standard families" but these do not work for extended family distributions or for Tweedie/negative binomial implementations in mgcv.

Some work has been started here on a version of simulate.gam which may be useful.

There may be also use for this function in mgcViz and gamFactory.

The aim is to implement those families not currently available.

Some simple subtasks

gavinsimpson commented 4 years ago

I have a simulate.gam function in gratia that apes the simulate methods from base R. There's also tidy versions of those things in the predicted_samples()

But it does need extending for all families in mgcv as those families aren't handled by the fix.family.rd() function in mgcv.

dill commented 4 years ago

Should have known to check gratia first!

This seems to fix the issues I reported about Tweedie and negbin (if gratia is loaded, they work). It seems that DHARMa intervenes when using extended.family responses so gratia doesn't get a look in.

mfasiolo commented 4 years ago

Similarly to gratia, mgcViz includes a simulate.gam method which relies on fix.family.rd( ) or fix.family.qf() to simulate responses. I wonder whether gamUtils could include more complete fix.family.xxx functions to complement what mgcv provides.

dill commented 4 years ago

Ah ace!

Okay now we have two options of where to start (or could start from scratch?). Slightly awkward question to ask you both, but do you have strong feelings about which of your implementations would be a good starting point?

My temptation is to use the mgcViz version as a starting point (if that's agreeable to you both) since it can use the inverse quantile method if the rd method is not available. BUT I don't want to step on anyone's toes, so let me know if this isn't okay with etiher of you!

@mfasiolo: I think adding complementary fix.family.* methods would be a great thing to do. I'll open another issue with that in, as it's a bit more general than just the simulate.gam method.

gavinsimpson commented 4 years ago

@dill That's fine with me as @mfasiolo's is more complete. We should probably drop the dependency on plyr though (for gamUtils) and would be happy to convert the plyr calls in Matteo's version to base R equivalents.

I also have some code for the gevlss family that isn't in gratia yet that IIRC can provide what is needed for qf or rd based sampling.

mfasiolo commented 4 years ago

It all sounds good, I have mixed feelings about the calls to plyr too.

dill commented 4 years ago

Great! I've added the function as-is for now to see what works and what doesn't.

Here's a list of responses and whether they are supported at the moment (those which are not checked I have not had a chance to look into yet, those with 🙅 I looked at and have errors).

in family:

in family.mgcv:

mfasiolo commented 4 years ago

Great job David, you are a Github ninja! ;)

The gaulss() family should work, as the gaulss$rd() is defined in mgcv.

mvn works but the simulation method is defined in simulate.gam() rather than within the mvn()$rd slot, so this is a candidate for fix.family.rd().

gevlss() is not covered, but the appropriate $rd function should be:

family$rd <- function(mu, wt, scale) {
    return(mu[ , 1] + exp(mu[ , 2]) * 
             ((-log(runif(nrow(mu))))^(-mu[ , 3])-1) / mu[ , 3])
  }

Dumb question: does it make sense to simulate from a quasi-family?

dill commented 4 years ago

I've added in gevlss witin the simulate.gam function for now, so the functionality is there, but eventually should be exported to whatever we decide on the fix.family front in #5

re: quasi-families, I think it doesn't make sense but I was copy-pasting from ?family for completeness.