Closed rpruim closed 2 years ago
The var2()
fix is easy. We can just formally introduce McElreath's definition from the rethinking GitHub repo:
var2 <- function( x , na.rm=TRUE ) {
# use E(x^2) - E(x)^2 form
mean(x^2) - mean(x)^2
}
Please say more about your second bullet.
There isn't really a reason to use var2()
. It is just dividing by n
instead of by n - 1
(hidden in the means); those denominators cancel in either case. This seems like a total distractor to the conversation. One could argue in favor of using sums of squares instead of variance to avoid the denominator, but I don't see any real reason for preferring var2()
.
The fact that McElreath documents this function as internal and not intended for users suggest that this was just an oversight on his part.
The standard definition of a residual is $y_i - \hat{y}_i$ = observed response - predicted response. Since Var(-X) = Var(X), it doesn't matter for the computation, but it is confusing to students who know how a residual should be defined. I would define the function as
R2_is_bad <- function(quap_fit, seed = 7, ...) {
set.seed(seed)
l <- link(quap_fit, refresh = 0, ...)
r <- Brains$brain_std - apply(l, 2, mean)
1 - var(r) / var(Brains$brain_std)
}
I'll meet you half way. Since the var2()
issue seems like a distraction from the larger issue, I'm just going to leave it in. I do think you've made a good point about using the standard definition of a residual, though. Thus, I'm going to update the formula to
R2_is_bad <- function(brm_fit, seed = 7, ...) {
set.seed(seed)
p <- predict(brm_fit, summary = F, ...)
# in my experience, it's more typical to define residuals as the criterion minus the predictions
r <- d$brain_std - apply(p, 2, mean)
1 - rethinking::var2(r) / rethinking::var2(d$brain_std)
}
@ASKurz, I would NOT do this. There is no guarantee that rethinking will continue to export var2()
and it is undocumented, except for documentation indicating it is only intended for internal use. The use of ::
doesn't really help with either of these problems.
If you want to keep using var2()
(which I would also not since var() will work just fine here and avoid all these problems), I would suggest that you define it in your code -- pretending it does not exist in the rethinking package -- so that it is clear what it is and doesn't rely on something not intended for public use. I thought that is what you meant in your post from April 3.
In your code comment, did you mean "criterion"? I'm not familiar with using that term in this context.
Yep. By "criterion," I mean the focal variable in the regression analysis. The DV, y variable, or response variable, if you will.
These both match what is in the text, but I think they are bad ideas.
var2()
does, and the documentation says: "These functions are not intended to be called directly." (In a cleaner implementation of the package, they would not be exported.)