Parametric term evaluation for ziplss models

pboesu commented 5 years ago

I'm working with models from the ziplss family that have parametric (i.e. non-smooth) terms, and these cause the evaluate_parametric_term method (and hence draw) to fail with an error message like

Error in evaluate_parametric_term.gam(object, term = terms[i]) : Term is not in the parametric part of model: <x3>,

because it does not handle the list structure of the two predictor formulae gracefully.

Reprex collapsed in here

```r #gratia error reprex library(mgcv) library(gratia) ## simulate some data... f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x) f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^10 n <- 500;set.seed(5) x0 <- runif(n); x1 <- runif(n) x2 <- runif(n); x3 <- runif(n) ## Simulate probability of potential presence... eta1 <- f0(x0) + f1(x1) - 3 p <- binomial()$linkinv(eta1) y <- as.numeric(runif(n)0 eta2 <- f2(x2[ind])/3 y[ind] <- rpois(exp(eta2),exp(eta2)) ## Fit ZIP model... b <- gam(list(y~s(x2)+s(x3),~s(x0)+s(x1)),family=ziplss()) draw(b) b1 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss()) draw(b1) # fails plot(b1,pages=1, all.terms = TRUE) #works ```

I've traced the issue to

https://github.com/gavinsimpson/gratia/blob/245ac45e87d64ef347ff8bf90af372cbf0273590/R/evaluate_smooth.R#L546

which doesn't behave as expected as tt is now a two-element list with labels "1" "2".

My hacky attempt to fix this by changing the above line to

vars <- unlist(lapply(tt, labels))

produces reasonable results if there is only a single parameteric predictor in the model, but falls over with a cryptic plot_grid error that I can trace to align_margin, but which I don't understand:

#after hack-fixing evaluate_parametric_term

b2 <- gam(list(y~s(x2)+x3,~s(x0)+s(x1),family=ziplss())
draw(b2) #this works

b3 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b3) # this fails

#Error in 1:(grep("null", x)[1] - 1) : NA/NaN argument
#In addition: Warning message:
#In predict.gam(object, newdata = mf, type = "terms", terms = term,  :
#  non-existent terms requested - ignoring
#Called from: FUN(X[[i]], ...)

sessionInfo()

```r R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134) Matrix products: default locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gratia_0.2-8 mgcv_1.8-28 nlme_3.1-140 loaded via a namespace (and not attached): [1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 splines_3.6.1 tidyselect_0.2.5 munsell_0.5.0 [7] cowplot_1.0.0 colorspace_1.4-1 lattice_0.20-38 R6_2.4.0 rlang_0.4.0 dplyr_0.8.3 [13] tools_3.6.1 packrat_0.5.0 grid_3.6.1 gtable_0.3.0 lazyeval_0.2.2 assertthat_0.2.1 [19] tibble_2.1.3 crayon_1.3.4 Matrix_1.2-17 tidyr_0.8.3 purrr_0.3.2 ggplot2_3.2.0 [25] glue_1.3.1 labeling_0.3 compiler_3.6.1 pillar_1.4.2 scales_1.0.0 mvtnorm_1.0-11 [31] pkgconfig_2.0.2 ```

gavinsimpson commented 5 years ago

Hi Philipp, yeah, handling these extended formula models (distributional regression models) is not something gratia does at the moment; Things work for the smooths as Simon just stores them as if they were just a single model with many smooths. The parametric terms code was written without considering those more general models.

This is on the ToDo list that I hope to get to once I get a new course written/start of term, whichever happened first.

gavinsimpson commented 5 years ago

I worked out what was going on; a combination of parametric_terms() returning a list for models with more than one linear predictor, and then needing to rename terms according to mgcvs weird convention of add .1, .2 to all smooths and terms in the higher linear predictors.

This should now work for all LSS models, but I have thus far only check the ziplss examples you provided.

Thanks for reporting this.

pboesu commented 5 years ago

Hi Gavin, thanks so much for looking into this and expanding the functionality of this great package!

Unfortunately I discovered that there's a further complication in my use case that wasn't covered by my original reprex. It appears that categorical predictors cause additional problems.

Amended reprex:

#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * 
  (10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)
x4 <- sample(factor(c('a','b','c')), size = n, replace = TRUE)

## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1) 
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence

## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))

## Fit ZIP model... 
b <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b)

#ZIP model with a categorical predictor
b0 <- gam(list(y~s(x2)+x4,~s(x0)+x1),family=ziplss())
draw(b0)# works but warning issued
plot(b0,pages=1, all.terms = TRUE) #works

#ZIP model with linear and categorical predictor
b1 <- gam(list(y~s(x2)+x3+x4,~s(x0)+x1),family=ziplss())
draw(b1) # fails
draw(b1, parametric = FALSE) # works
plot(b1,pages=1, all.terms = TRUE) #works

Having a single categorical predictor yields a warning, but the partial effect plot appears.

Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'x4' is absent, its contrast will be ignored

Having a categorical predictor and another term leads to a failure

Error: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
In addition: Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'x4' is absent, its contrast will be ignored
2: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'x4' is absent, its contrast will be ignored
3: In predict.gam(object, newdata = mf, type = "terms", terms = mgcv_names[ind],  :
  non-existent terms requested - ignoring

with the following traceback

<error>
message: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
class:   `rlang_error`
backtrace:
  1. gratia::draw(b1)
  2. gratia:::draw.gam(b1)
  4. gratia:::evaluate_parametric_term.gam(object, term = terms[i])
  6. tibble:::as_tibble.data.frame(evaluated)
  7. tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
  8. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
  9. tibble:::set_repaired_names(x, .name_repair)
 14. tibble:::repaired_names(names(x), .name_repair = .name_repair)
 15. tibble:::check_unique(new_name)
Call `rlang::last_trace()` to see the full backtrace

which to me appears to indicate that names are being lost as they are being passed from the model frame to the prediction frames, but my knowledge of programming with tibbles is very limited. Any clues?

gavinsimpson commented 5 years ago

Thanks for the revised example; I was likely missing some subtleties when implementing this. mgcv can fit such a range of models and this was worked on for simpler models; guess I need to expand the test suites to include some more complex examples.

I’ll take a look at why this is failing.

gavinsimpson commented 1 year ago

Finally found some time to fix this after I'd completely changed the way I handle parametric effects (evaluate_parametric_term() has bee soft deprecated for a little while now). Sorry it took so long.

gavinsimpson / gratia

Parametric term evaluation for ziplss models #45