fhui28 / CBFM

Spatio-temporal joint species distribution modeling using community-level basis functions
13 stars 2 forks source link

Hierarchical or grouped smoothing? #18

Open fhui28 opened 2 years ago

fhui28 commented 2 years ago

Seeing if https://peerj.com/articles/6876/ can be adapted to the CBFM framework. What @chrishaak may be more sophisticated than this though because it is doing specific groups/subsets of species.

fhui28 commented 2 years ago

If you consider smoothing terms,

What @chrishaak wants is a bit more interesting: we have G << s groups of species e.g., each group consists of juveniles and adults of an OTU. Both S and I models are possible in this scenario, but I presume the latter is what we actually want i.e., each group has different wiggliness.

So there is a case for allowing both S and I models in CBFM...and allowing an a-priori grouping factor.


An issue is that I am not sure the above can be actually done using mgcv software for parametric covariates?

@chrishaak if you can comment on the above that would be much appreciated!

fhui28 commented 2 years ago

This will be harder than I first thought! At the moment I think usage of the factor smooth interactions, or by argument, will break the capacity to have species-specific dispersion parameters that need to be estimated

fhui28 commented 2 years ago

Interesting that perhaps with the GS model, the way https://peerj.com/articles/6876/ seem to employ it is such that global smooth (G) has a different smoothing parameter to the species-specific curves (S) which all share the same smoothing parameter but can be different to that of the global.

Easy to fix this though using the id command in mgcv. But maybe that was deliberate by the authors?

gavinsimpson commented 2 years ago

Easy to fix this though using the id command in mgcv. But maybe that was deliberate by the authors?

Yes, this was deliberate. Because the fs basis is fully penalized, one interpretation of the individual subject-specific smooths is as a smooth deviation of each subject from the global smooth. Under that interpretation, it is desirable to have a potentially wigglier global smooth to capture the common relationship, while deviations from that relationship are all else equal likely to be simpler, i.e. less wiggly.

Using the id mechanism might be tricky in this case as the fs basis will have more smoothing parameters (3, one each for the "random" intercepts, "random" slopes, and "random" smooths) than the global smooth and {mgcv} isn't able to tie up the penalty and smoothing parameter for the global smooth with the third penalty and smoothing parameter in the fs smooth.

Consider the GS model for the CO2 data from the paper

## load packages
library("gratia")
library("mgcv")
library("ggplot2")
library("datasets")
## data load and prep
data(CO2, package = "datasets")
CO2 <- transform(CO2, Plant_uo = factor(Plant, ordered = FALSE))
ctrl <- gam.control(nthreads = 3)

## CO2 - try to constrain G and S smooths to have same smoothing parameters
CO2_mod2 <- gam(log(uptake) ~ s(log(conc), k = 5, m = 2, id = "a") +
                    s(log(conc), Plant_uo, k = 5,  bs = "fs", m = 2, id = "a"),
                data = CO2, method = "REML", family = gaussian(),
                control = ctrl)

which fails for the reason I surmised:

r$> CO2_mod2 <- gam(log(uptake) ~ s(log(conc), k = 5, m = 2, id = "a") + 
                                s(log(conc), Plant_uo, k = 5,  bs = "fs", m = 2, id = "a"), 
                            data = CO2, method = "REML", family = gaussian(), 
                            control = ctrl)
Error in clone.smooth.spec(split$smooth.spec[[base.i]], split$smooth.spec[[i]]) : 
  `id' linked smooths must have same number of arguments
fhui28 commented 2 years ago

Thanks heaps Gavin!

I did not expect you find this comment as it was more of a random conversation that @chrishaak and I were having a while back, so much appreciated for the pleasant surprise =D

What you say makes sense, and honestly I do not recall the precise ecological motivation driving this conversation now, so I might leave it here for future reference in case Chris and I came back to it.

chrishaak commented 2 years ago

It could make sense to assume that the global smooth/shared trend would be "simplified" compared to the species-level smooths, which could be allowed to be more complex?

On Sat, Apr 30, 2022 at 7:28 AM Francis KC Hui @.***> wrote:

Interesting that perhaps with the GS model, the way https://peerj.com/articles/6876/ seem to employ it is such that global smooth (G) has a different smoothing parameter to the species-specific curves (S) which all share the same smoothing parameter but can be different to that of the global.

Easy to fix this though using the id command in mgcv. But maybe that was deliberate by the authors?

— Reply to this email directly, view it on GitHub https://github.com/fhui28/CBFM/issues/18#issuecomment-1113971661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORLQXGMVXKPQT6T6Q7MVU3VHUKNLANCNFSM5MT6HB2A . You are receiving this because you were mentioned.Message ID: @.***>