biodiverse / ubms

Fit models to data from unmarked animals using Stan. Uses a similar interface to the R package 'unmarked', while providing the advantages of Bayesian inference and allowing estimation of random effects.
https://hmecology.github.io/ubms/
GNU General Public License v3.0
35 stars 8 forks source link

check_formula(): warning and incomplete check for long formula #59

Closed jniedballa closed 2 years ago

jniedballa commented 2 years ago

Hi Ken, when using a very long model formula, check_formula throws this warning:

In if (grepl(":|/", char)) { :
  the condition has length > 1 and only the first element will be used

It is because https://github.com/kenkellner/ubms/blob/36d9b88aa126ceb8fca1b236d57e99e199c8c629/R/submodel.R#L167 splits the formula and returns a vector of length > 1, of which only the first item is then checked for nested random effects.

deparse1() instead of deparse() would avoid that, but that's only for R4.0 and higher.

char <- paste(deparse(formula_long, width.cutoff = 500L))

seems to work though.

Altenratively, if char is a vector of length > 1 (this might print unwanted output):

sapply(char, FUN = function(x){
  if(grepl(":|/", x)){
    stop("Nested random effects (using / and :) are not supported",
         call.=FALSE)
  }
  })

Here's a reproducible example:

formula_long <- formula("~ (longcovariate1 + longcovariate + longcovariate3 + longcovariate4 || group)")

char <- paste(deparse(formula_long))

if(grepl(":|/", char)){
  stop("Nested random effects (using / and :) are not supported",
       call.=FALSE)
}

Note: I didn't check how a long formula would affect get_reTrms a few lines below

kenkellner commented 2 years ago

Thanks! I thought I had fixed all the long formula issues, but clearly not. I re-used this code in unmarked and a few months ago I fixed it there to

 char <- paste(formula, collapse=" ")

https://github.com/rbchan/unmarked/blob/5de9df9ed80f5eca482fa4fe6f9c6a5f4f95d04b/R/mixedModelTools.R#L89

But never brought it back over to ubms. This seems to also work with your example, you see any other issues with this approach?

jniedballa commented 2 years ago

I guess it would be ok... but I also don't know enough about the inner workings of ubms to say with confidence. I'd be happy to try a development version of the package on my offending workflow and report back if you're uncertain.

kenkellner commented 2 years ago

Fixed in d9d619e.

In the process I discovered an issue when a formula specified random slopes for an R factor variable, which doesn't work properly. The workaround is to manually code them as indicator variables. I've added a better error message.

This is due to the way lme4 generates the random effects model matrix. I probably need to write my own code to handle this instead of using lme4.