Open dill opened 4 months ago
The author name is shown as "¿citation_author?" in the pdf on the gratia github. Not sure if this is normal.
This seems to be OK in version generated by the review system itself (I think I have something wrong in the YAML meta data for the the citation_author
element which is causing the problem there.) So I'll leave this just now (beyond tweaking the citation_author
in the YAML) and figure it out if the issue presents later.
Further to our conversation, some thoughts on the improper prior issue:
# get some data
library(mgcv)
set.seed(2)
dat <- gamSim(1,n=400,dist="normal",scale=2)
# tprs has a null space dimension of 1 -- cool
gam(y~s(x0), data=dat)$smooth[[1]]$null.space.dim
# ts has a null space dimension of 0 -- also cool
gam(y~s(x0, bs="ts"), data=dat)$smooth[[1]]$null.space.dim
# what about B-splines with a continuous penalty? hmmm
gam(y~s(x0, bs="bs"), data=dat)$smooth[[1]]$null.space.dim
# now using code from the paper ;)
library(gratia)
library(dplyr)
library(ggplot2)
# basis get
bfs <- basis(gam(y~s(x0, bs="bs"), data=dat), select = "s(x0)")
# plot basis functions and fitted spline
bfs |>
ggplot(aes(x = x0, y = .value)) +
geom_line(aes(colour = .bf)) +
guides(colour = "none") +
labs(x = "x", y = "y", title = "s(x)")
Well that gives us this:
which I don't see any "linear" terms in... ¯_(ツ)_/¯
Going back to the boring situation in which we "read the documentation", in ?smooth.construct
we have:
null.space.dim: The dimension of the penalty null space (before centering).
So I think this is before identifiability constraints are applied (since they need to be applied at the model, not term level. So I think the nullspace just consists of the intercept in that case.
Doing a tiny bit more digging:
jagam(y~s(x0, bs="bs"), data=dat, centred=FALSE, file="m1.jags")
jagam(y~s(x0, bs="bs")-1, data=dat, centred=FALSE, file="m0.jags")
Produces:
model {
mu <- X %*% b ## expected response
for (i in 1:n) { y[i] ~ dnorm(mu[i],tau) } ## response
scale <- 1/tau ## convert tau to standard GLM scale
tau ~ dgamma(.05,.005) ## precision parameter prior
## prior for s(x0)...
K1 <- S1[1:10,1:10] * lambda[1] + S1[1:10,11:20] * lambda[2]
b[1:10] ~ dmnorm(zero[1:10],K1)
## smoothing parameter priors CHECK...
for (i in 1:2) {
lambda[i] ~ dgamma(.05,.005)
rho[i] <- log(lambda[i])
}
}
model {
mu <- X %*% b ## expected response
for (i in 1:n) { y[i] ~ dnorm(mu[i],tau) } ## response
scale <- 1/tau ## convert tau to standard GLM scale
tau ~ dgamma(.05,.005) ## precision parameter prior
## Parametric effect priors CHECK tau=1/79^2 is appropriate!
for (i in 1:1) { b[i] ~ dnorm(0,0.00016) }
## prior for s(x0)...
K1 <- S1[1:10,1:10] * lambda[1] + S1[1:10,11:20] * lambda[2]
b[2:11] ~ dmnorm(zero[2:11],K1)
## smoothing parameter priors CHECK...
for (i in 1:2) {
lambda[i] ~ dgamma(.05,.005)
rho[i] <- log(lambda[i])
}
}
So I think that makes sense? (Running to stand in front of my poster so I will think about this more in a wee bit!)
Going back to the boring situation in which we "read the documentation", in ?smooth.construct we have:
null.space.dim: The dimension of the penalty null space (before centering). So I think this is before identifiability constraints are applied (since they need to be applied at the model, not term level. So I think the nullspace just consists of the intercept in that case.
But for that to be true, wouldn't the null space dimension for a TPRS (the default) be = 2 because it has both constant and linear basis functions, yet as you show it is 1.
If we do
library("mgcv")
library("gratia")
df <- data_sim("gwf2", seed = 2)
m <- gam(y ~ s(x, bs = "cr"), data = df)
# extract penalty
S <- m$smooth[[1]]$S[[1]]
# eigen decompoisition
eigen(S)
producing
> eigen(S)
eigen() decomposition
$values
[1] 2.201890e+00 1.420716e+00 8.582862e-01 4.449231e-01 1.969655e-01
[6] 7.046850e-02 1.825633e-02 2.373200e-03 -6.021430e-17
where 1 eigenvalue is numerically 0 and the matrix is not invertible by normal means.
$> solve(S)
Error in solve.default(S) :
system is computationally singular: reciprocal condition number = 4.66992e-19
And this is what I would expect, because a linear function is still in the span of basis, even if there is no actual linear function.
Now consider the same model but with select = TRUE
, where we include an additional penalty on the null space:
m <- gam(y ~ s(x, bs = "cr"), data = df, select = TRUE)
Now the null space of this smooth is 0, as it should be because of the extra penalty:
> m$smooth[[1]]$null.space.dim
[1] 0
Some more discussion is needed here...
Some further reading has helped, if we're just talking about the Null space of a penalty. The key point I think is that the null space is the span of functions representable by the basis for which the penalty has no effect (and hence it's size is the number of such representable functions that are unaffected by the penalty). The null space doesn't refer to specific basis functions but to the functions representable by that basis.
Section 5.4.3 of Simon's GAM book has:
which I think is clear. This explains the findings in my comment above. I have assumed that the rank deficiency of $S$ is what leads to the improper prior when we view penalized smoothing from a Bayesian perspective. But I should read a little more about that...
Here's some clearer evidence that the smooths have a null space. This needs the very latest version of gratia from github but equivalent results can be achieved with mgcv::smoothCon(..., diagonal.penalty = TRUE, absorb.cons = TRUE)
.
We can rotate the basis functions using an eigendecomposition to reparameterize the smooth such that it has a diagonal (identity) penalty. The null space is then highlighted because the diagonal elements are zero for the null space.
remotes::install_github("gavinsimpson/gratia")
library("gratia")
library("patchwork")
df <- data_sim("gwf2", seed = 2)
p1 <- basis(s(x, bs = "bs"), data = df,
constraints = TRUE, diagonalize = TRUE) |>
draw()
p2 <- penalty(s(x, bs = "bs"), data = df,
constraints = TRUE, diagonalize = TRUE) |>
draw()
p1 + p2 + plot_layout(ncol = 2, nrow = 1)
Here, we're using a B-spline basis and have absorbed the sum-to-zero identifiability constraint into the basis and then reparameterized to diagonalize the penalty. Absorbing the identifiability constraint means the constant / zero function is not in the span of the basis. But the linear functions remain in the span of the basis. This is clear from the reparametrisation shown above, where we see an actual linear basis function and a zero row/column in the penalty matrix.
The reparameterisation doesn't change anything with respect to the original basis, it just highlights the null space of the basis more clearly.
After some extensive chat here, online and offline (by nice coincidence at the same conference), we've got somewhere on this (the improper prior business)... I'll write this collectively for both our findings but if @gavinsimpson needs to correct me anywhere, I trust that will happen.
sp=10e10
, e.g.) -- we have a straight line, even though we don't have an explicit slope termts
and cs
bases (where there is a small penalty added to the null space terms)select=TRUE
additional penalty matrices for the null space termstp
, ps
, bs
, maybe others), where the slope is not in the null space and the intercept is removed with absorbing constraints.Does that cover everything @gavinsimpson?
Then two additional comments because @gavinsimpson mistakenly asked me to look more closely at his equations...
(this list interrupted by my realising that github now renders $\LaTeX$, wow! That's going to make the next bit easier...)
@dill
The formulation for the GAM (first display equation) doesn't allow for multivariate smooths.
Fixed; Dave and I spoke about this while at ISEC this summer and we couldn't really think of a better way to represent the most common form of a GAM than tediously write out multiple smooth functions $f_j()$ in the equation. So, I have updated the manuscript to match Dave's suggestion and usage in his Bayesian interpretation of GAMs preprint.
...In addition you only talk about scale parameters as the other parameters of the distribution.
This was intentional; just wanted to talk about a standard GLM-like GAM. I have now written out the Tweedie distributional model in full later.
"GAMs fitted by mgcv are an empirical Bayesian model with an improper multivariate normal prior on the basis function coeficients." Is that always the case? One might argue that the intercept is always in the nullspace of every penalty, but one could remove that from the model and have a proper MVN prior? Could just put the improper in brackets with the word "usually"?
Done.
"which, with the default penalty, is the integrated squared second derivative" -- I don't think this is quite right.
Done.
Is Figure 1a the basis functions or the basis functions multiplied by their corresponding coefficients? I think it's the latter?
Yep, it's the latter. Fixed.
In previous snippet, coefficients, not "coeficients"
Fixed
Miller (2019) should be Miller (2021) -- I updated it on arXiv.
Done
"The response is assumed to be conditionally distributed Tweedie" -> "The response is assumed to be conditionally Tweedie distributed"?
Done
"Model coefficients and smoothing parameters are estimated using restricted maximum likelihood (Wood, 2011)" missing period at end of sentence?
Done
Explanation of ctrl <- gam.control(nthreads = 10)?
Have added a small explanation
"which show significant heteroscedasticity and departure from the condtional distribution of the response given the model" -- could highlight that we see this due to the increasing spread in the deviance residuals wrt values of the linear predictor in the top right plot?
Done
"given the absence of important effects in the model" not sure what this means?
I was trying to convey the point that given we don't have many potentially important covariates, the original model may be too inflexible to capture the variation that would be explained by the missing covariates if we had them. If have added
"which may be too inflexible if missing covariates mean that important effects are not included in the model"
to try to better explain what I mean.
Done
In the previous snippet "condtional" -> "conditional"
Done
In the Tweedie LSS model lead-in, it looks like you're using \varphi rather than \phi for the scale parameter (as you did in the intro), not sure if this was intentional?
It was intentional; I used $\phi$ for the GLM dispersion parameter, while I use $\varphi$ for the scale parameter of the Tweedie. I think the twlss()
model still has a dispersion parameter but, like the Poisson, it is fixed to equal 1.
But, it was a little confusing. I think I have things right now; I still want to distinguish between the dispersion parameter (it just happens to be the Tweedie scale parameter in model m1
) in general when I write the general GAM equation and the Tweedie scale parameter when I am talking about the Tweedie distribution and the distributional GAM, which is more specific.
"little data wrangling, we can produce an uncertaintry estimate ,using fitted_samples() to" misplaces comma.
Fixed
Miller (2019) should be Miller (2021) In the references: ... [mistakes in the bibliography]
I have now fixed the ones you found. I missed the brms one and the other things were weird entries in my ref manager metadata. I'll do another sweep once the reviews are concluded.
Generally, I think folks use... (from a more recent comment) on combining the...
Done and I think Done. Perhaps you could give this last part (the one about combining the $\gamma$s in with the $\beta$s
@dill Thanks all the comments and suggestions and importantly taking the time to talk through some of the prior / null space things we both touch on in the comments above.
I've addressed all your comments (see above) bar 1:
It would be nice to have a legend in Figure 1 so readers can relate the basis functions to the values in the penalty matrix (and e.g., see that the most wiggly function has the highest penalty)? Open to push-back on this, if you want to just give default plots from gratia
Done now. The basis function plot was not a plot that was generated by one of gratia's plots, the penalty plot was but the basis function one needed some level of customisation anyway, so I added labelled lines.
I'll push this and then GH Actions will render the pdf to check. Will need to be looking in the joss-paper branch.
(Part of review here.)
This was fun to read and I learnt some things :)
Some minor statistical/editorial comments on the paper:
gratia
github. Not sure if this is normal.twlss()
model), there are more parameters to consider. You might not want to address this in the intro -- that's fine!gratia
.ctrl <- gam.control(nthreads = 10)
?\varphi
rather than\phi
for the scale parameter (as you did in the intro), not sure if this was intentional?fitted_samples()
to" misplaces comma.brms
" but "Brms" looks weird.