Open emstruong opened 1 week ago
This seems like a perfectly reasonable (although breaking backward compatibility). For the places where one platform's tidy-output doesn't obviously dominate the other (e.g. we presumably want to have rhat
and ess
available for both platforms, not neither ...), which way do you suggest we make them consistent? e.g. should both platforms return robust or non-robust summaries by default?
Any chance of a pull request ... ??
I would normally agree to do a pull-request, but I am unfortunately far past my max bandwidth and really shouldn't agree to anything. Sorry.
My memory is that rstanarm
and brms
are not entirely consistent in whether the reported parameter estimates (summary(mod)
) are robust or not. Although rstanarm
prints the robust estimates (print(mod)
), the summary of the rstanarm
model seems to be the non-robust estimates. Whereas brms
just has the non-robust estimates.
In my opinion, tidy should be consistent with the original package's intent, so then I think we should keep tidy.brms
to non-robust. It may be better to ask the rstanarm
people what they really want.
Although, if it were up to me, they would both default to the robust estimates. For context, I'm developing a Monte Carlo sim comparing rstanarm/brms to lmer--when used with as many default settings as possible--and what I found is that the non-robust estimates can sometimes be beyond completely absurd.
If I were to request an early christmas gift, we'd also get the bulk and tail ESS of the parameter estimates.
Though... If I were to make a pull-request should I adjust the rstanarm tidy function to still depend on fit$ses
or can I just manually compute all the ses
myself? It seems like rstanarm will take a while to implement their fix...
So I don't know enough about HMC or Bayes to know if this is a problem, but broom.mixed:::tidy.rstanarm()
uses rstanarm::VarCorr()
to get the estimates of the random parameters. But rstanarm::VarCorr()
uses colMeans()
of each of the ^Sigma
columns. I would've thought that a robust estimate would use the median of the column as opposed to the mean of the column.
EDIT: Given that tidy(fit, robust = TRUE)
and tidy(fit, robust = FALSE)
can vary for brms
models for the random parameters, I'm going to assume that the use of colMeans()
versus the median
matters. What slightly confuses me is that tidy.brmsfit
seems to simply apply the median
or mean
to the draws, whereas tidy.rstanarm
gets the VarCorr
.
I assume that this is due to differences in the parameterization of the random parameter between brms
and rstanarm
?
I wouldn't necessarily assume that. Haven't dug into this/thought about this carefully, but a lot of the machinery of these two methods may have been contributed by others/stolen from other places, so inconsistencies might be entirely accidental ... FWIW brms:::VarCorr.brmsfit
has a robust
argument, so tidy.brmsfit
could (and should probably) use it instead of working with the draws directly. But that doesn't help with tidy.rstanarm
- you could include a wish for a robust
option in your existing open rstanarm issue ...
Oh I think there may be a mis-understanding, maybe, what I'm guessing is happening is that in brms, the stan code is such that you get the draws of the standard deviation of the random effects directly. However, in rstanarm, the draws are some pre-cursor to the standard deviation of the random effects. That's what I mean by differences in parameterization.
Regardless, I understand wanting to work with brms::::VarCorr.brmsfit
instead of directly with the draws, but I'm getting the feeling that it may be better to work directly with whatever output the package itself is providing, as opposed to computing it ourselves or even using other nominally-related packages. When I tried getting the tail_ess
from the draws using posterior::ess_tail()
, it gave me a slightly different number than the tail_ess
of the brms
summary output.
So I'm confused...
I absolutely agree that using package-specific accessors wherever possible is the best practice. This is why it might be nice to request a robust
option for rstanarm::VarCorr
... it would seem easy enough to add by adding the argument and changing one line of code accordingly, i.e. something like
sumfun <- if (robust) median else mean
scols <- grepl("^Sigma\\[", colnames(mat))
Sigma <- apply(mat[,scols, drop = FALSE], 1, sumfun)
There would be a tiny performance cost using apply()
rather than colMeans
for non-robust estimates - could also do
scols <- grepl("^Sigma\\[", colnames(mat))
if (robust) {
Sigma <- apply(mat[,scols, drop = FALSE], 1, median)
} else {
Sigma <- colMeans(mat[,scols, drop = FALSE])
}
if preferred ...
Hold on now though, but doesn't your one-liner require that we apply it on the draws? I thought you said you were against applying it on the draws. :sweat_smile:
Well either way, the one-liner alone won't make rstanarm
produce the standard errors for the random parameters. We'd probably need them to fix things up first.
The code I showed is based on the rstanarm
code (i.e., material for a pull request there), not the broom.mixed
code ...
It would be a really nice and appreciated feature if the
tidy.brmsfit
andtidy.stanreg
functions had overlapping arguments and functionality.Some notable differences are:
tidy.brmsfit
uses non-robust estimates by default, whereastidy.stanreg
uses the robust estimatetidy.stanreg
doesn't seem to allow the extraction ofrhat
andess
for each parameter, whereastidy.brmsfit
doestidy.brmsfit
reports the standard errors and confidence intervals for all parameters. Yet fortidy.stanreg
, they seem to be missing for random effects for some reason that I haven't been able to figure out.rstanarm
problem with what it's storing inses
tidy.brmsfit
's documentation says that you can get "ran_vals", but the default-arguments/documentation of usage implies that it's only fixed or random parametersHere is a reprex of the differences
Created on 2024-11-15 with reprex v2.1.1