gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

Comments on JOSS paper #296

Open dill opened 4 months ago

dill commented 4 months ago

(Part of review here.)

This was fun to read and I learnt some things :)

Some minor statistical/editorial comments on the paper:

gavinsimpson commented 4 months ago

The author name is shown as "¿citation_author?" in the pdf on the gratia github. Not sure if this is normal.

This seems to be OK in version generated by the review system itself (I think I have something wrong in the YAML meta data for the the citation_author element which is causing the problem there.) So I'll leave this just now (beyond tweaking the citation_author in the YAML) and figure it out if the issue presents later.

dill commented 4 months ago

Further to our conversation, some thoughts on the improper prior issue:


# get some data
library(mgcv)
set.seed(2)
dat <- gamSim(1,n=400,dist="normal",scale=2)

# tprs has a null space dimension of 1 -- cool
gam(y~s(x0), data=dat)$smooth[[1]]$null.space.dim

# ts has a null space dimension of 0 -- also cool
gam(y~s(x0, bs="ts"), data=dat)$smooth[[1]]$null.space.dim

# what about B-splines with a continuous penalty? hmmm
gam(y~s(x0, bs="bs"), data=dat)$smooth[[1]]$null.space.dim

# now using code from the paper ;)
library(gratia)
library(dplyr)
library(ggplot2)

# basis get
bfs <- basis(gam(y~s(x0, bs="bs"), data=dat), select = "s(x0)")

 # plot basis functions and fitted spline
bfs |>
   ggplot(aes(x = x0, y = .value)) +
   geom_line(aes(colour = .bf)) +
   guides(colour = "none") +
   labs(x = "x", y = "y", title = "s(x)")

Well that gives us this:

Screenshot_2024-07-16_15-59-49

which I don't see any "linear" terms in... ¯_(ツ)_/¯

Going back to the boring situation in which we "read the documentation", in ?smooth.construct we have:

null.space.dim: The dimension of the penalty null space (before centering).

So I think this is before identifiability constraints are applied (since they need to be applied at the model, not term level. So I think the nullspace just consists of the intercept in that case.

Doing a tiny bit more digging:

jagam(y~s(x0, bs="bs"), data=dat, centred=FALSE, file="m1.jags")
jagam(y~s(x0, bs="bs")-1, data=dat, centred=FALSE, file="m0.jags")

Produces:

model {
  mu <- X %*% b ## expected response
  for (i in 1:n) { y[i] ~ dnorm(mu[i],tau) } ## response 
  scale <- 1/tau ## convert tau to standard GLM scale
  tau ~ dgamma(.05,.005) ## precision parameter prior 
  ## prior for s(x0)... 
  K1 <- S1[1:10,1:10] * lambda[1]  + S1[1:10,11:20] * lambda[2]
  b[1:10] ~ dmnorm(zero[1:10],K1) 
  ## smoothing parameter priors CHECK...
  for (i in 1:2) {
    lambda[i] ~ dgamma(.05,.005)
    rho[i] <- log(lambda[i])
  }
}
model {
  mu <- X %*% b ## expected response
  for (i in 1:n) { y[i] ~ dnorm(mu[i],tau) } ## response 
  scale <- 1/tau ## convert tau to standard GLM scale
  tau ~ dgamma(.05,.005) ## precision parameter prior 
  ## Parametric effect priors CHECK tau=1/79^2 is appropriate!
  for (i in 1:1) { b[i] ~ dnorm(0,0.00016) }
  ## prior for s(x0)... 
  K1 <- S1[1:10,1:10] * lambda[1]  + S1[1:10,11:20] * lambda[2]
  b[2:11] ~ dmnorm(zero[2:11],K1) 
  ## smoothing parameter priors CHECK...
  for (i in 1:2) {
    lambda[i] ~ dgamma(.05,.005)
    rho[i] <- log(lambda[i])
  }
}

So I think that makes sense? (Running to stand in front of my poster so I will think about this more in a wee bit!)

gavinsimpson commented 4 months ago

Going back to the boring situation in which we "read the documentation", in ?smooth.construct we have:

null.space.dim: The dimension of the penalty null space (before centering). So I think this is before identifiability constraints are applied (since they need to be applied at the model, not term level. So I think the nullspace just consists of the intercept in that case.

But for that to be true, wouldn't the null space dimension for a TPRS (the default) be = 2 because it has both constant and linear basis functions, yet as you show it is 1.

If we do

library("mgcv")
library("gratia")

df <- data_sim("gwf2", seed = 2)
m <- gam(y ~ s(x, bs = "cr"), data = df)

# extract penalty
S <- m$smooth[[1]]$S[[1]]

# eigen decompoisition
eigen(S)

producing

> eigen(S)
eigen() decomposition
$values
[1]  2.201890e+00  1.420716e+00  8.582862e-01  4.449231e-01  1.969655e-01
[6]  7.046850e-02  1.825633e-02  2.373200e-03 -6.021430e-17

where 1 eigenvalue is numerically 0 and the matrix is not invertible by normal means.

$> solve(S)
Error in solve.default(S) :
  system is computationally singular: reciprocal condition number = 4.66992e-19

And this is what I would expect, because a linear function is still in the span of basis, even if there is no actual linear function.

Now consider the same model but with select = TRUE, where we include an additional penalty on the null space:

m <- gam(y ~ s(x, bs = "cr"), data = df, select = TRUE)

Now the null space of this smooth is 0, as it should be because of the extra penalty:

> m$smooth[[1]]$null.space.dim
[1] 0

Some more discussion is needed here...

gavinsimpson commented 4 months ago

Some further reading has helped, if we're just talking about the Null space of a penalty. The key point I think is that the null space is the span of functions representable by the basis for which the penalty has no effect (and hence it's size is the number of such representable functions that are unaffected by the penalty). The null space doesn't refer to specific basis functions but to the functions representable by that basis.

Section 5.4.3 of Simon's GAM book has:

BB0B45E9-91FB-4F35-B194-73D0F06C1446

which I think is clear. This explains the findings in my comment above. I have assumed that the rank deficiency of $S$ is what leads to the improper prior when we view penalized smoothing from a Bayesian perspective. But I should read a little more about that...

gavinsimpson commented 4 months ago

Here's some clearer evidence that the smooths have a null space. This needs the very latest version of gratia from github but equivalent results can be achieved with mgcv::smoothCon(..., diagonal.penalty = TRUE, absorb.cons = TRUE).

We can rotate the basis functions using an eigendecomposition to reparameterize the smooth such that it has a diagonal (identity) penalty. The null space is then highlighted because the diagonal elements are zero for the null space.

remotes::install_github("gavinsimpson/gratia")
library("gratia")
library("patchwork")

df <- data_sim("gwf2", seed = 2)

p1 <- basis(s(x, bs = "bs"), data = df,
    constraints = TRUE, diagonalize = TRUE) |>
  draw()

p2 <- penalty(s(x, bs = "bs"), data = df,
    constraints = TRUE, diagonalize = TRUE) |>
  draw()

p1 + p2 + plot_layout(ncol = 2, nrow = 1)

penalty-null-space-fig

Here, we're using a B-spline basis and have absorbed the sum-to-zero identifiability constraint into the basis and then reparameterized to diagonalize the penalty. Absorbing the identifiability constraint means the constant / zero function is not in the span of the basis. But the linear functions remain in the span of the basis. This is clear from the reparametrisation shown above, where we see an actual linear basis function and a zero row/column in the penalty matrix.

The reparameterisation doesn't change anything with respect to the original basis, it just highlights the null space of the basis more clearly.

dill commented 3 months ago

After some extensive chat here, online and offline (by nice coincidence at the same conference), we've got somewhere on this (the improper prior business)... I'll write this collectively for both our findings but if @gavinsimpson needs to correct me anywhere, I trust that will happen.

Does that cover everything @gavinsimpson?

dill commented 3 months ago

Then two additional comments because @gavinsimpson mistakenly asked me to look more closely at his equations...

(this list interrupted by my realising that github now renders $\LaTeX$, wow! That's going to make the next bit easier...)

gavinsimpson commented 3 days ago

@dill

The formulation for the GAM (first display equation) doesn't allow for multivariate smooths.

Fixed; Dave and I spoke about this while at ISEC this summer and we couldn't really think of a better way to represent the most common form of a GAM than tediously write out multiple smooth functions $f_j()$ in the equation. So, I have updated the manuscript to match Dave's suggestion and usage in his Bayesian interpretation of GAMs preprint.

...In addition you only talk about scale parameters as the other parameters of the distribution.

This was intentional; just wanted to talk about a standard GLM-like GAM. I have now written out the Tweedie distributional model in full later.

"GAMs fitted by mgcv are an empirical Bayesian model with an improper multivariate normal prior on the basis function coeficients." Is that always the case? One might argue that the intercept is always in the nullspace of every penalty, but one could remove that from the model and have a proper MVN prior? Could just put the improper in brackets with the word "usually"?

Done.

"which, with the default penalty, is the integrated squared second derivative" -- I don't think this is quite right.

Done.

Is Figure 1a the basis functions or the basis functions multiplied by their corresponding coefficients? I think it's the latter?

Yep, it's the latter. Fixed.

In previous snippet, coefficients, not "coeficients"

Fixed

Miller (2019) should be Miller (2021) -- I updated it on arXiv.

Done

"The response is assumed to be conditionally distributed Tweedie" -> "The response is assumed to be conditionally Tweedie distributed"?

Done

"Model coefficients and smoothing parameters are estimated using restricted maximum likelihood (Wood, 2011)" missing period at end of sentence?

Done

Explanation of ctrl <- gam.control(nthreads = 10)?

Have added a small explanation

"which show significant heteroscedasticity and departure from the condtional distribution of the response given the model" -- could highlight that we see this due to the increasing spread in the deviance residuals wrt values of the linear predictor in the top right plot?

Done

"given the absence of important effects in the model" not sure what this means?

I was trying to convey the point that given we don't have many potentially important covariates, the original model may be too inflexible to capture the variation that would be explained by the missing covariates if we had them. If have added

"which may be too inflexible if missing covariates mean that important effects are not included in the model"

to try to better explain what I mean.

Done

In the previous snippet "condtional" -> "conditional"

Done

In the Tweedie LSS model lead-in, it looks like you're using \varphi rather than \phi for the scale parameter (as you did in the intro), not sure if this was intentional?

It was intentional; I used $\phi$ for the GLM dispersion parameter, while I use $\varphi$ for the scale parameter of the Tweedie. I think the twlss() model still has a dispersion parameter but, like the Poisson, it is fixed to equal 1.

But, it was a little confusing. I think I have things right now; I still want to distinguish between the dispersion parameter (it just happens to be the Tweedie scale parameter in model m1) in general when I write the general GAM equation and the Tweedie scale parameter when I am talking about the Tweedie distribution and the distributional GAM, which is more specific.

"little data wrangling, we can produce an uncertaintry estimate ,using fitted_samples() to" misplaces comma.

Fixed

Miller (2019) should be Miller (2021) In the references: ... [mistakes in the bibliography]

I have now fixed the ones you found. I missed the brms one and the other things were weird entries in my ref manager metadata. I'll do another sweep once the reviews are concluded.

Generally, I think folks use... (from a more recent comment) on combining the...

Done and I think Done. Perhaps you could give this last part (the one about combining the $\gamma$s in with the $\beta$s

gavinsimpson commented 3 days ago

@dill Thanks all the comments and suggestions and importantly taking the time to talk through some of the prior / null space things we both touch on in the comments above.

I've addressed all your comments (see above) bar 1:

It would be nice to have a legend in Figure 1 so readers can relate the basis functions to the values in the penalty matrix (and e.g., see that the most wiggly function has the highest penalty)? Open to push-back on this, if you want to just give default plots from gratia

Done now. The basis function plot was not a plot that was generated by one of gratia's plots, the penalty plot was but the basis function one needed some level of customisation anyway, so I added labelled lines.

I'll push this and then GH Actions will render the pdf to check. Will need to be looking in the joss-paper branch.