Closed strengejacke closed 4 years ago
That's an interesting edge case!
The related regexp might be this one in the helper-function .get_fixed_effects()
:
https://github.com/easystats/insight/blob/442ad486a723e0b942eb27bf8d18efc1485033c8/R/helper_functions.R#L98
this might be a lot easier using lme4::nobars()
...
I'm not sure, I think we have implemented it this way to make it work with other packages like lme or GLMMadaptive as well.
Or maybe to reduce package dependencies.
Or maybe to work with panel models, like plm, feist or panelr. There has been a reason to generalize this function.
OK. My only point was that working with the language object, as much of a pain as it is, is often more robust than deparsing and using regexes ...
Ok, I can narrow down this a bit. .get_fixed_effects()
is called from insight::find_formula()
for all model-objects that can have random effects. This includes packages like lme4, glmmTMN, GLMMadaptive, brms, rstanarm, afex, coxme, ordinal, and some more.
Now there were two reasons for finding an own solution: 1) insight, as low-level-package, should only depend on packages that ship with R (base, utils, stats, ...) 2) the function should also work for "complicated" formulas, especially for brms, there are some edge cases we needed to capture.
here are two examples (ignore the response-part), the second one giving a non-desired result.
lme4::nobars(as.formula("time | cens(censored) ~ age * sex + disease + (1|patient)"))
#> age * sex + disease
insight:::.get_fixed_effects(as.formula("time | cens(censored) ~ age * sex + disease + (1|patient)"))
#> [1] "time | cens(censored) ~ age * sex + disease"
lme4::nobars(as.formula("success | trials(ntrials) ~ x + (1 | patient)"))
#> x ~ 1
insight:::.get_fixed_effects(as.formula("success | trials(ntrials) ~ x + (1 | patient)"))
#> [1] "success | trials(ntrials) ~ x"
Created on 2019-07-01 by the reprex package (v0.3.0)
But maybe it is indeed better to find a solution based on the language-object itself instead of using regex for the parsed formula-string. I'll look into this.
I think I figured out the required regex pattern: "(\\+|:|\\*|/)(?![^\\(]*\\))"
I checked various examples, all extract the correct fixed effects terms and the random effects, so I can remove the RE terms from the formula:
pattern <- "(\\+|:|\\*|/)(?![^\\(]*\\))"
p <- "am ~ disp:wt + (1|gear) + wt + (1+wt|carb)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ disp:wt + wt + (1|gear) + (1+wt|carb)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ (1|gear) + (1+wt|carb) + disp:wt + wt"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ 1 + (1|gear)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "1"
p <- "am ~ disp:wt * (1|gear) + wt + (1+wt|carb)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ disp:wt * (1|gear) + wt + (1*wt|carb)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ disp:wt * wt + (1|gear) + (1+wt|carb)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ (1|gear) + (1+wt|carb) + disp:wt * wt"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "wt"
p <- "am ~ 1 + (1|gear)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "1"
p <- "am ~ disp:wt + poly(gear, 2) + wt + (1+wt|carb)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "disp" "wt" "poly(gear, 2)" "wt"
p <- "y ~ post + time1 + (1 | g2 / g1 / g0) + (post + time1 - 1 | g2)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "post" "time1"
p <- "count ~ mined + (1 | site) + offset(Wtemp)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "mined" "offset(Wtemp)"
p <- "count ~ mined + offset(Wtemp) + (1 | site)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "mined" "offset(Wtemp)"
p <- "time | cens(censored) ~ age * sex + disease + (1|patient)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "age" "sex" "disease"
p <- "success | trials(ntrials) ~ x + (1 | patient)"
p <- deparse(as.formula(p)[[3]])
x <- lapply(strsplit(p, pattern, perl = TRUE), trimws)
unlist(x)[!grepl("\\((.*)\\|(.*)\\)", unlist(x))]
#> [1] "x"
Created on 2020-07-12 by the reprex package (v0.3.0)
I certainly haven't tried this on all your new examples, but simply restricting lme4::nobars()
to apply only to the right-hand side of the formula resolves the original examples where nobars gave an undesired/incorrect answer.
As for self-containedness, lme4::nobars
(and the associated hidden utility lme4::nobars_()
) could easily be copied ...
f1 <- as.formula("time | cens(censored) ~ age * sex + disease + (1|patient)")
f2 <- as.formula("success | trials(ntrials) ~ x + (1 | patient)")
my_nb <- function(x) {
x[[3]] <- lme4::nobars(x[[3]])
return(x)
}
lme4::nobars(f1)
## age * sex + disease
insight:::.get_fixed_effects(f1)
## "time | cens(censored) ~ age * sex + disease"
my_nb(f1)
lme4::nobars(f2)
## x ~ 1
insight:::.get_fixed_effects(f1)
## [1] "success | trials(ntrials) ~ x"
my_nb(f1)
## time | cens(censored) ~ age * sex + disease
PS you might want to use the new deparse1
function to protect yourself against the situation where users decide to include reeeeeeeeeeeeeeeeeeeeeeeeeeaaaaaaalllllllllllly long variable names or formulas ...
Indeed, using lme4::nobars()
only on the RHS of the formula would solve the issue, I think. The reprex (with some additional modifications) and nobars()
both seem to work fine on all examples (see below).
@ deparse1
: yes, we are using an internal .safe_deparse()
function, that literally does the same:
https://github.com/easystats/insight/blob/master/R/helper_functions.R#L472-L477
I'm aware of the bnew deparse1()
, however, due to back compatibility with older R versions, I would stick to our own solution.
mynb <- function(x) {
x <- as.formula(x)
x[[3]] <- lme4::nobars(x[[3]])
x
}
mynb2 <- function(x) {
x <- as.formula(x)
# only RHS of formula, and regex pattern
fe <- deparse(x[[3]])
pattern <- "(\\+|:|\\*|/)(?![^\\(]*\\))"
# trim at operators
parts <- lapply(strsplit(fe, pattern, perl = TRUE), trimws)
# extract RE parts of formula
re <- unlist(parts)[grepl("\\((.*)\\|(.*)\\)", unlist(parts))]
# remove each RE part
for (i in re) {
fe <- gsub(i, "", fe, fixed = TRUE)
}
# remove duplicated operator signs. When we have "x1 + (1|grp) + x2",
# after removing we have "x1 + + x2".
fe <- trimws(gsub("(\\+)(\\s*)(\\+)", "+", fe))
# we might have "+" at start or end, "+ x1 + x2" or "x1 + x2 +"
fe <- trimws(gsub("(\\s*)\\+$", "", fe))
fe <- trimws(gsub("^\\+", "", fe))
# re-build formula
as.formula(paste0(deparse(x[[2]]), " ~ ", fe))
}
p <- "am ~ disp:wt + (1|gear) + wt + (1+wt|carb)"
mynb(p)
#> am ~ disp:wt + wt
#> <environment: 0x0000000015924d50>
mynb2(p)
#> am ~ disp:wt + wt
#> <environment: 0x000000001b5064f0>
p <- "am ~ disp:wt + wt + (1|gear) + (1+wt|carb)"
mynb(p)
#> am ~ disp:wt + wt
#> <environment: 0x00000000195df7c0>
mynb2(p)
#> am ~ disp:wt + wt
#> <environment: 0x000000001910b410>
p <- "am ~ (1|gear) + (1+wt|carb) + disp:wt + wt"
mynb(p)
#> am ~ disp:wt + wt
#> <environment: 0x0000000019030d18>
mynb2(p)
#> am ~ disp:wt + wt
#> <environment: 0x0000000018fb41c0>
p <- "am ~ 1 + (1|gear)"
mynb(p)
#> am ~ 1
#> <environment: 0x000000001388c230>
mynb2(p)
#> am ~ 1
#> <environment: 0x000000001be28800>
p <- "am ~ disp:wt * (1|gear) + wt + (1+wt|carb)"
mynb(p)
#> am ~ disp:wt + wt
#> <environment: 0x000000001befab48>
mynb2(p)
#> am ~ disp:wt * +wt
#> <environment: 0x000000001a717530>
p <- "am ~ disp:wt * (1|gear) + wt + (1*wt|carb)"
mynb(p)
#> am ~ disp:wt + wt
#> <environment: 0x000000001c1626e8>
mynb2(p)
#> am ~ disp:wt * +wt
#> <environment: 0x000000001c1c8ee8>
p <- "am ~ disp:wt * wt + (1|gear) + (1+wt|carb)"
mynb(p)
#> am ~ disp:wt * wt
#> <environment: 0x000000001c2845b0>
mynb2(p)
#> am ~ disp:wt * wt
#> <environment: 0x000000001c2ef1c8>
p <- "am ~ (1|gear) + (1+wt|carb) + disp:wt * wt"
mynb(p)
#> am ~ disp:wt * wt
#> <environment: 0x000000001c3af230>
mynb2(p)
#> am ~ disp:wt * wt
#> <environment: 0x000000001c417918>
p <- "am ~ 1 + (1|gear)"
mynb(p)
#> am ~ 1
#> <environment: 0x000000001c4ec4a8>
mynb2(p)
#> am ~ 1
#> <environment: 0x000000001c547db8>
p <- "am ~ disp:wt + poly(gear, 2) + wt + (1+wt|carb)"
mynb(p)
#> am ~ disp:wt + poly(gear, 2) + wt
#> <environment: 0x000000001c646e60>
mynb2(p)
#> am ~ disp:wt + poly(gear, 2) + wt
#> <environment: 0x000000001c6af698>
p <- "y ~ post + time1 + (1 | g2 / g1 / g0) + (post + time1 - 1 | g2)"
mynb(p)
#> y ~ post + time1
#> <environment: 0x000000001c784180>
mynb2(p)
#> y ~ post + time1
#> <environment: 0x000000001c9a8580>
p <- "count ~ mined + (1 | site) + offset(Wtemp)"
mynb(p)
#> count ~ mined + offset(Wtemp)
#> <environment: 0x000000001cb03728>
mynb2(p)
#> count ~ mined + offset(Wtemp)
#> <environment: 0x000000001cb69d90>
p <- "count ~ mined + offset(Wtemp) + (1 | site)"
mynb(p)
#> count ~ mined + offset(Wtemp)
#> <environment: 0x000000001cc24060>
mynb2(p)
#> count ~ mined + offset(Wtemp)
#> <environment: 0x000000001cc87040>
p <- "time | cens(censored) ~ age * sex + disease + (1|patient)"
mynb(p)
#> time | cens(censored) ~ age * sex + disease
#> <environment: 0x000000001cd41310>
mynb2(p)
#> time | cens(censored) ~ age * sex + disease
#> <environment: 0x000000001cda5cc0>
p <- "success | trials(ntrials) ~ x + (1 | patient)"
mynb(p)
#> success | trials(ntrials) ~ x
#> <environment: 0x000000001ce64348>
mynb2(p)
#> success | trials(ntrials) ~ x
#> <environment: 0x000000001cec36a8>
Created on 2020-07-12 by the reprex package (v0.3.0)
I think the regex pattern still fails for an edge-case (parenthesis inside RE parenthesis):
m <- lmer(
log(Reaction) ~ Days + I(Days^2) + (1 + Days + exp(Days) | Subject),
data = sleepstudy
)
while nobars()
on the RHS still works fine. Furthermore, benchmarks using microbenchmark showed that the fastest regex-implementation I found was still twice as long as calling nobars()
(though we're talking of nano-seconds here). So finally I'm indeed using nobars.
Strange enough that it took me one year to realize that the solution is simply restricting nobars()
to the RHS of the formula... 😖 🙈
This is an issue in insight:
As my regular expression knowledge is too limited to currently find a solution, I'd suggest you just re-write your formula and put the offset-term before the random term(s).
Originally posted by @strengejacke in https://github.com/strengejacke/ggeffects/issues/72#issuecomment-484003127