Closed pboesu closed 1 year ago
Hi Philipp, yeah, handling these extended formula models (distributional regression models) is not something gratia does at the moment; Things work for the smooths as Simon just stores them as if they were just a single model with many smooths. The parametric terms code was written without considering those more general models.
This is on the ToDo list that I hope to get to once I get a new course written/start of term, whichever happened first.
I worked out what was going on; a combination of parametric_terms()
returning a list for models with more than one linear predictor, and then needing to rename terms according to mgcvs weird convention of add .1, .2 to all smooths and terms in the higher linear predictors.
This should now work for all LSS models, but I have thus far only check the ziplss examples you provided.
Thanks for reporting this.
Hi Gavin, thanks so much for looking into this and expanding the functionality of this great package!
Unfortunately I discovered that there's a further complication in my use case that wasn't covered by my original reprex. It appears that categorical predictors cause additional problems.
Amended reprex:
#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 *
(10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)
x4 <- sample(factor(c('a','b','c')), size = n, replace = TRUE)
## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1)
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence
## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))
## Fit ZIP model...
b <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b)
#ZIP model with a categorical predictor
b0 <- gam(list(y~s(x2)+x4,~s(x0)+x1),family=ziplss())
draw(b0)# works but warning issued
plot(b0,pages=1, all.terms = TRUE) #works
#ZIP model with linear and categorical predictor
b1 <- gam(list(y~s(x2)+x3+x4,~s(x0)+x1),family=ziplss())
draw(b1) # fails
draw(b1, parametric = FALSE) # works
plot(b1,pages=1, all.terms = TRUE) #works
Having a single categorical predictor yields a warning, but the partial effect plot appears.
Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
variable 'x4' is absent, its contrast will be ignored
Having a categorical predictor and another term leads to a failure
Error: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
In addition: Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
variable 'x4' is absent, its contrast will be ignored
2: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
variable 'x4' is absent, its contrast will be ignored
3: In predict.gam(object, newdata = mf, type = "terms", terms = mgcv_names[ind], :
non-existent terms requested - ignoring
with the following traceback
<error>
message: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
class: `rlang_error`
backtrace:
1. gratia::draw(b1)
2. gratia:::draw.gam(b1)
4. gratia:::evaluate_parametric_term.gam(object, term = terms[i])
6. tibble:::as_tibble.data.frame(evaluated)
7. tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
8. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
9. tibble:::set_repaired_names(x, .name_repair)
14. tibble:::repaired_names(names(x), .name_repair = .name_repair)
15. tibble:::check_unique(new_name)
Call `rlang::last_trace()` to see the full backtrace
which to me appears to indicate that names are being lost as they are being passed from the model frame to the prediction frames, but my knowledge of programming with tibbles is very limited. Any clues?
Thanks for the revised example; I was likely missing some subtleties when implementing this. mgcv can fit such a range of models and this was worked on for simpler models; guess I need to expand the test suites to include some more complex examples.
I’ll take a look at why this is failing.
Finally found some time to fix this after I'd completely changed the way I handle parametric effects (evaluate_parametric_term()
has bee soft deprecated for a little while now). Sorry it took so long.
I'm working with models from the ziplss family that have parametric (i.e. non-smooth) terms, and these cause the
evaluate_parametric_term
method (and hencedraw
) to fail with an error message likeError in evaluate_parametric_term.gam(object, term = terms[i]) : Term is not in the parametric part of model: <x3>
,because it does not handle the list structure of the two predictor formulae gracefully.
Reprex collapsed in here
```r #gratia error reprex library(mgcv) library(gratia) ## simulate some data... f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x) f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^10 n <- 500;set.seed(5) x0 <- runif(n); x1 <- runif(n) x2 <- runif(n); x3 <- runif(n) ## Simulate probability of potential presence... eta1 <- f0(x0) + f1(x1) - 3 p <- binomial()$linkinv(eta1) y <- as.numeric(runif(n)
0 eta2 <- f2(x2[ind])/3 y[ind] <- rpois(exp(eta2),exp(eta2)) ## Fit ZIP model... b <- gam(list(y~s(x2)+s(x3),~s(x0)+s(x1)),family=ziplss()) draw(b) b1 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss()) draw(b1) # fails plot(b1,pages=1, all.terms = TRUE) #works ```
I've traced the issue to
https://github.com/gavinsimpson/gratia/blob/245ac45e87d64ef347ff8bf90af372cbf0273590/R/evaluate_smooth.R#L546
which doesn't behave as expected as
tt
is now a two-element list with labels"1" "2"
.My hacky attempt to fix this by changing the above line to
vars <- unlist(lapply(tt, labels))
produces reasonable results if there is only a single parameteric predictor in the model, but falls over with a cryptic
plot_grid
error that I can trace toalign_margin
, but which I don't understand:sessionInfo()
```r R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134) Matrix products: default locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gratia_0.2-8 mgcv_1.8-28 nlme_3.1-140 loaded via a namespace (and not attached): [1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 splines_3.6.1 tidyselect_0.2.5 munsell_0.5.0 [7] cowplot_1.0.0 colorspace_1.4-1 lattice_0.20-38 R6_2.4.0 rlang_0.4.0 dplyr_0.8.3 [13] tools_3.6.1 packrat_0.5.0 grid_3.6.1 gtable_0.3.0 lazyeval_0.2.2 assertthat_0.2.1 [19] tibble_2.1.3 crayon_1.3.4 Matrix_1.2-17 tidyr_0.8.3 purrr_0.3.2 ggplot2_3.2.0 [25] glue_1.3.1 labeling_0.3 compiler_3.6.1 pillar_1.4.2 scales_1.0.0 mvtnorm_1.0-11 [31] pkgconfig_2.0.2 ```