Closed billdenney closed 2 years ago
How to handle "fixed" (as opposed to "fixed effect") parameters is a much more general issue that I think has not been tackled within broom.mixed
. It certainly seems useful to have a consistent way for these different parameter types to be delimited/for users to specify whether or not they want both estimated and fixed parameters (although in general it's easiest as a first step to provide everything and allow users to filter()
subsequently if they want). One way to identify fixed parameters is via an NA
standard error, but that's not a reliable marker (standard errors might also be NA
because they're not estimated by a particular package for a particular variable type, or because of numerical problems). I think the best solution is to add yet another column ... adding meta-data within the term names in a consistent way will work, of course, but doesn't seem (to me) to be good design.
Can we see if this has come up/raise this as an issue at the broom
repo ... ?
I'm happy to add the broom check (i.e. raise an issue there). But, I do think that the parameter should be shown since it is part of the coef()
. If it were omitted, there would not be a simple way to see that it was part of the model (so, the tidy()
results would seem to be incomplete and misleading relative to the actual model).
I agree that they should be included. The questions are (1) how to indicate which parameters are fixed; (2) whether to allow a built-in argument for excluding fixed parameters (there are plenty of use cases where we want to do this, even if the default should be "include both"). I think your solution is OK, I am just a little bit paranoid about making interface/API decisions and being stuck with them/having to break backward compatibility if & when we change our mind.
I fully agree. It's best to have the well-thought-out solution rather than the quick solution for something like this.
Moving the conversation back here now that we're into specifics for broom.mixed
...
I will add the estimated
column to tidy.varFunc()
output. The only open questions to me are:
estimated
to all outputs or only when some varFunc
parameters are not estimated? I tend not to like outputs that have differing structure based on the inputs, but in this case, I think that it is better to only add estimated
when there are fixed values. The reason is that I think fixed
is very rarely used. (I'm using the egocentric definition of "rarely" that I'd never used it and I've not seen it used by others.) Since I think that it's very rarely used, I think that the difference between estimate
and estimated
would be confusing more often than helpful.ran_pars
and fixed
), should we set estimated
to TRUE
instead of NA
? I think that if the column is there, it should be set to something other than NA
, but we should not add the column just to set it to TRUE
.nlme
, you can fix the residual error to a specific value (usually 1). So, we should consider adding the estimated
column to that residual error value when it is fixed. Do you agree? (If so, I'll probably make a separate PR for that unless it's just a few lines more.)Whoops, I didn't mean to close this.
As usual there are two conflicting rubrics here: (1) simplicity (don't clutter up all tidied models with every possible column that could apply to every possible category of model that broom.mixed
covers); (2) uniformity of output (output format/dimensions should be consistent across inputs).
The availability of dplyr::bind_rows()
(which easily handles incongruent sets of columns in its arguments) makes the inconsistency problem much less annoying.
I think I agree with your suggestions (1: add estimated
only when there are at least some non-estimated parameters; 2: set non-estimated parameters to TRUE; 3: to be consistent with point 1, do add the estimated
column if there are any non-estimated parameters).
In documentation (or as a first-class function, or in auxiliary code in inst/...
) we could add a "helper function" that binds together tidied frames with and without fixed (non-estimated) parameters:
bind_tidy <- function(...) {
add_estimated <- function(x) {
if (!"estimated" %in% names(x)) mutate(x, estimated=TRUE) else x))
L <- list(...)
## was a list passed as an argument?
if (length(L) == 1 && !is.data.frame(L)) L <- L[[1]]
L %>% map(add_estimated) %>% bind_rows()
}
or something like that (easier to use replace_na(estimated, TRUE)
but that would fail to
distinguish some case where we actually wanted estimated
to be NA
... ???
Maybe overkill ...
I think that the bind_tidy()
function does seem like overkill. I would add something like the following to the documentation. It works regardless of the column existing, and it's one line of added code without requiring a lot more code on our side.
I'll make the updates as you indicate above, and I'll include an example like the below into the documentation.
library(tidyverse)
not_really_a_tidied_model <-
tibble(
term=c("A", "B")
)
not_really_a_tidied_model_with_estimated <-
tibble(
term=c("A", "B"),
estimated=c(FALSE, TRUE)
)
# Learned from https://stackoverflow.com/questions/45146688/execute-dplyr-operation-only-if-column-exists
not_really_a_tidied_model %>%
filter(across(any_of("estimated"), ~.x))
#> # A tibble: 2 x 1
#> term
#> <chr>
#> 1 A
#> 2 B
not_really_a_tidied_model_with_estimated %>%
filter(across(any_of("estimated"), ~.x))
#> # A tibble: 1 x 2
#> term estimated
#> <chr> <lgl>
#> 1 B TRUE
Created on 2021-08-12 by the reprex package (v2.0.0)
I just made the changes agreed above. The only change that I didn't make is adding the "estimated" column when sigma is fixed. The reason I didn't add that is that I didn't see how that information is stored in the model object. Without trying to parse the call itself, I don't see how to figure it out. (And, parsing the call seems like it would be error-prone.)
Fix #114
The main thing that I'm less sure of is how to handle
fixed
parameters. I added " ; fixed" to the term which seemed like the best way. But, I've never used them before, so I'm not sure that it's correct.I think that I have handled all classes (via the generic varFunc and a separate varComb method). I have included the formula in the groups description, and I am extracting the
coef()
using the same arguments used in printing the output.