Closed chrishaak closed 9 months ago
Thanks for the report and easy to implement examples to reproduce the issue. I think the fundamental issue is that I didn't consider 2d TPR or Duchon splines when I wrote the interval plotters for the sz and fs basis.
Would you mind adding the output from sessionInfo()
(or session info::session_info()
if you have it) to your report.
I'll confirm what's going on later this week – both models should be expected to work with gratia – and see what needs to be changed to make them work.
Thanks for the lightning-fast response Gavin. Sorry, I actually had the sessioninfo in my code but forgot to paste it!
R version 4.2.3 (2023-03-15) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS
Matrix products: default BLAS: /home/christopher/miniconda3/envs/r423_blis/lib/libblis.so.4.0.0 LAPACK: /home/christopher/miniconda3/envs/r423_blis/lib/liblapack.so.3.9.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=Cattached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] mgcv_1.9-0 nlme_3.1-163 gratia_0.8.2
loaded via a namespace (and not attached): [1] Rcpp_1.0.12 pillar_1.9.0 compiler_4.2.3 RColorBrewer_1.1-3 tools_4.2.3 mvnfast_0.2.8 lifecycle_1.0.4
[8] tibble_3.2.1 gtable_0.3.4 lattice_0.22-5 pkgconfig_2.0.3 rlang_1.1.2 Matrix_1.6-2 cli_3.6.1
[15] rstudioapi_0.15.0 patchwork_1.1.3 withr_2.5.2 dplyr_1.1.3 stringr_1.5.0 generics_0.1.3 vctrs_0.6.4
[22] isoband_0.2.7 grid_4.2.3 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.5 ggplot2_3.4.4
[29] purrr_1.0.2 tidyr_1.3.0 farver_2.1.1 magrittr_2.0.3 scales_1.2.1 splines_4.2.3 colorspace_2.1-0
[36] labeling_0.4.3 utf8_1.2.4 stringi_1.8.1 munsell_0.5.0
Thanks for the report. As I suspected, the different errors were ultimately due the underlying code not knowing how to handle multivariate base smoothers.
"fs"
smoothsThe "fs"
smooth issue raised a different error because I was not careful enough to handle users specifying the factor anywhere in the smooth definition. I was assuming users would use s(x1, x2, f)
so factor last. I have fixed this particular issue such that if you use s(f, x1, x2)
or s(x1, f, x2)
smooth_estimates()
will now work correctly.
"sz"
smoothsThe error with that model was purely due to the code not anticipating a multivariate base smoother.
I now catch this and do something rather than let the obscure error pass through.
None of this actually helps you as in both cases {gratia} can't currently do anything useful from the plot side of things.
Right now I am just emitting messages for both your model examples:
r$> draw(i_fs)
Can't currently plot multivariate 'fs' smooths.
Skipping: s(Sepal.Length,Sepal.Width,Species)
r$> draw(i_sz)
Can't currently plot multivariate 'sz' smooths.
Skipping: s(Species,Sepal.Length,Sepal.Width)
but it will plot any other smooths it can plot.
It's not clear to me what to plot in the fs
case. I could plot a few of the smooth surfaces, randomly selecting which to plot? For the sz
case, I could plot the smooth "difference" surfaces, but how to actually do this in the draw()
output? Should I treat it like a by
smooth and draw a separate plot for each surface, or should I group them so that I only generate a single plot but use facetting to show the difference-from-reference surfaces for each level of the factor(s)?
A similar issue crops up with HGAMs with random tensor product smooths: t2(x1, x2, f, bs = c("cr", "cr", "re"), full = TRUE)
. Which of these should I plot and how?
So, back to you @chrishaak. I assume you weren't modelling the iris
data in reality, so what would you like to see plotted in both your model specifications?
I'll push fixed code shortly; you'll need to install from github or r-universe (if it builds; there's an issues with the dependency on the Matrix pkg which I need to figure out there) when I have pushed. Check that you get version >= 0.8.2.58. At least then you'll be able to use smooth_estimates()
to evaluate the smooths and then you can choose to plot them however you wish using {ggplot2}.
That's funny- I normally put the factor last in the 'fs' smooths but I vaguely remember there was a requirement somewhere along the way that I needed to put it first to plot 'sz' smooths? I guess that backfired in the 'fs' case...
For the 'sz' case, I was thinking along the lines that you suggest above, plotting the difference surface for each level of species. While I conceptually like the idea of being able to see the differences overlayed/plotted together (as it works so nicely in the 1-D 'sz' case), I'm afraid that this could get messy in the 2D case (particularly when there are many levels). So, from the perspective of legibility/interpretability, it might make sense to keep these plots independent (per-level), basically mimicking what you do in the 'by' smooth?. In my particular situation, I don't have a ton of levels, so it's not not an issue, but understanding that your code needs to be generalizable, maybe adopt an approach like "if nlevels > X, then draw a random sample of X levels", where X defaults to some sane value (8?) And if people want to plot them all or look at specific levels, that can be user-selectable, either by setting X higher, or requesting specific levels via a vector?
I'd probably adopt a similar approach for 'fs', just mimicking the 'by' smooth.
That's my 2 cents anyway - hope it's useful.
Thanks so much once again for all your help!
On Fri, Feb 2, 2024 at 6:37 AM Gavin Simpson @.***> wrote:
Thanks for the report. As I suspected, the different errors were ultimately due the underlying code not knowing how to handle multivariate base smoothers. "fs" smooths
The "fs" smooth issue raised a different error because I was not careful enough to handle users specifying the factor anywhere in the smooth definition. I was assuming users would use s(x1, x2, f) so factor last. I have fixed this particular issue such that if you use s(f, x1, x2) or s(x1, f, x2) smooth_estimates() will now work correctly. '"sz"` smooths
The error with that model was purely due to the code not anticipating a multivariate base smoother.
I now catch this and do something rather than let the obscure error pass through. … however
None of this actually helps you as in both cases {gratia} can't currently do anything useful from the plot side of things.
Right now I am just emitting messages for both your model examples:
r$> draw(i_fs) Can't currently plot multivariate 'fs' smooths. Skipping: s(Sepal.Length,Sepal.Width,Species)
r$> draw(i_sz) Can't currently plot multivariate 'sz' smooths. Skipping: s(Species,Sepal.Length,Sepal.Width)
but it will plot any other smooths it can plot.
It's not clear to me what to plot in the fs case. I could plot a few of the smooth surfaces, randomly selecting which to plot? For the sz case, I could plot the smooth "difference" surfaces, but how to actually do this in the draw() output? Should I treat it like a by smooth and draw a separate plot for each surface, or should I group them so that I only generate a single plot but use facetting to show the difference-from-reference surfaces for each level of the factor(s)?
A similar issue crops up with HGAMs with random tensor product smooths: t2(x1, x2, f, bs = c("cr", "cr", "re"), full = TRUE). Which of these should I plot and how?
So, back to you @chrishaak https://github.com/chrishaak. I assume you weren't modelling the iris data in reality, so what would you like to see plotted in both your model specifications?
I'll push fixed code shortly; you'll need to install from github or r-universe (if it builds; there's an issues with the dependency on the Matrix pkg which I need to figure out there) when I have pushed. Check that you get version >= 0.8.2.58.
— Reply to this email directly, view it on GitHub https://github.com/gavinsimpson/gratia/issues/249#issuecomment-1923632856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORLQXDKJ3MPUJDDSQPVIL3YRTFZHAVCNFSM6AAAAABCTL6RJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRTGYZTEOBVGY . You are receiving this because you were mentioned.Message ID: @.***>
...I vaguely remember there was a requirement somewhere along the way that I needed to put it first to plot 'sz' smooths?
I also thought this was the case, but it turns out you can put it anywhere.
I'll see what I can do. Right now, by
smooths work easily because from mgcv's point of view these are really just entirely separate smooths - they are a separate entry in the $smooth
list that is return by gam()
, bam()
etc. "fs"
and "sz"
smooths are different; there is just one element in $smooth
for the entire "fs"
or "sz"
smooth. This is fine with univariate base smooths as I can produce one plot showing all levels (even if it gets messy, with the sz basis in particular), but that won't work for surfaces as I can't plot them on top of one another.
The solution I have used with trivariate and quadvariate smooths is to use facetting; the one "plot" would then have multiple facets, one per level of the factor.
Otherwise I'll need to look at how to return multiple ggplot objects for a single smooth if I am to follow the by smooth convention...
Gotcha... seems like facetting is the way to go then, if it is relatively straightforward to implement.
While we're on the topic (feel free to let me know if you'd rather I start a new issue here)...
I'm fitting to scaled covariate data (using 'scale'), but I'd like to see the "real" (back-transformed) predictor values in my visualizations. Short of going into each of the smooth estimates and manually replacing the x values, do you see any intelligent way to do this "across the board" using draw(), given the list of scaling info output by 'scale'? (I dug around a little and didn't see anything, but may have overlooked something)?
On Fri, Feb 2, 2024 at 10:50 AM Gavin Simpson @.***> wrote:
...I vaguely remember there was a requirement somewhere along the way that I needed to put it first to plot 'sz' smooths?
I also thought this was the case, but it turns out you can put it anywhere.
I'll see what I can do. Right now, by smooths work easily because from mgcv's point of view these are really just entirely separate smooths - they are a separate entry in the $smooth list that is return by gam(), bam() etc. "fs" and "sz" smooths are different; there is just one element in $smooth for the entire "fs" or "sz" smooth. This is fine with univariate base smooths as I can produce one plot showing all levels (even if it gets messy, with the sz basis in particular), but that won't work for surfaces as I can't plot them on top of one another.
The solution I have used with trivariate and quadvariate smooths is to use facetting; the one "plot" would then have multiple facets, one per level of the factor.
Otherwise I'll need to look at how to return multiple ggplot objects for a single smooth if I am to follow the by smooth convention...
— Reply to this email directly, view it on GitHub https://github.com/gavinsimpson/gratia/issues/249#issuecomment-1924147693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORLQXGY62FC6JD3HOZSHULYRUDN3AVCNFSM6AAAAABCTL6RJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGE2DONRZGM . You are receiving this because you were mentioned.Message ID: @.***>
I'm fitting to scaled covariate data (using 'scale'), but I'd like to see the "real" (back-transformed) predictor values in my visualizations
I assume you mean that you have y ~ s(x)
where x
is the result of x <- scale(x_orig)[,1]
? There isn't a good solution to that currently. Right now I would suggest that you just evaluate the smooths with smooth_estimates()
, and then build your own plots. I'm looking at ways to make building your own plots that look like draw()s
plots even easier.
Until then, you could modify the output from smooth_estimates()
and then use its draw()
method:
library("dplyr")
library("mgcv")
library("gratia")
df <- data_sim("eg1") |>
mutate(x1_scl = (x1 - mean(x1)) / sd(x1))
m <- gam(y ~ s(x0) + s(x1_scl) + s(x2) + s(x3), data = df, method = "REML")
sm <- smooth_estimates(m)
# unscale x1_scl
x1_mean <- with(df, mean(x1))
x1_sd <- with(df, sd(x1))
sm <- sm |>
mutate(x1_scl = (x1_scl * x1_sd) + x1_mean)
draw(sm)
You don't get things like partial residuals or a rug plot, but it is pretty close to what draw()
will produce (as draw.gam()
calls the draw.smooth_estimates()
for each smooth internally).
Yep that's correct. Thanks for the suggestion - that is more I less what I've been doing (albeit less efficiently than your approach!).
Wonder if it would be feasible to allow one to supply x-scaling attributes somehow in a call to draw(), so it could apply them "on the fly"?
On Fri, Feb 2, 2024 at 11:22 AM Gavin Simpson @.***> wrote:
I'm fitting to scaled covariate data (using 'scale'), but I'd like to see the "real" (back-transformed) predictor values in my visualizations
I assume you mean that you have y ~ s(x) where x is the result of x <- scale(x_orig)[,1]? There isn't a good solution to that currently. Right now I would suggest that you just evaluate the smooths with smooth_estimates(), and then build your own plots. I'm looking at ways to make building your own plots that look like draw()s plots even easier.
Until then, you could modify the output from smooth_estimates() and then use its draw() method:
library("dplyr") library("mgcv") library("gratia")
df <- data_sim("eg1") |> mutate(x1_scl = (x1 - mean(x1)) / sd(x1)) m <- gam(y ~ s(x0) + s(x1_scl) + s(x2) + s(x3), data = df, method = "REML")
sm <- smooth_estimates(m)
unscale x1_scl
x1_mean <- with(df, mean(x1)) x1_sd <- with(df, sd(x1))
sm <- sm |> mutate(x1_scl = (x1_scl * x1_sd) + x1_mean) draw(sm)
You don't get things like partial residuals or a rug plot, but it is pretty close to what draw() will produce (as draw.gam() calls the draw.smooth_estimates() for each smooth internally).
— Reply to this email directly, view it on GitHub https://github.com/gavinsimpson/gratia/issues/249#issuecomment-1924202491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORLQXBMTFCVZR4S7QSJ4ZTYRUHEDAVCNFSM6AAAAABCTL6RJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGIYDENBZGE . You are receiving this because you were mentioned.Message ID: @.***>
I'm fitting an HGAM with a global 2d smooth and a factor smooth interaction of the 2d smooth (via bs='sz' or 'fs'). For example:
When attempting to plot these with draw(), I receive (altogether different) errors for both. For the 'sz' smooth, I am getting:
while for the 'fs' smooth I am getting:
If I omit the second (factor-smooth interaction) term via
select=c(-2)
in the call to draw(), the first term plots fine for both of the fits.Finally, using the 'by' argument (instead of 'fs' or 'sz'), both terms plot fine. for example:
...draws both terms as expected. So it seems to be an issue with the 'fs' and 'sz' in particular...
Apologies if I am missing something here?
Thanks! Chris