Open mattansb opened 2 years ago
Can you check what r2.default()
returns?
R is (correctly) adjusting its computation of the predicted sum of squares based on whether there's an intercept or not.
We shouldn't expect these two results to be the same because your model has an intercept in effect even though you told R it didn't.
r <- z$residuals
f <- z$fitted.values
mss <- if (attr(z$terms, "intercept"))
sum((f - mean(f))^2) else sum(f^2)
rss <- sum(r^2)
if (p != attr(z$terms, "intercept")) {
df.int <- if (attr(z$terms, "intercept")) 1L else 0L
ans$r.squared <- mss/(mss + rss)
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)
ans$fstatistic <- c(value = (mss/(p - df.int))/resvar,
numdf = p - df.int, dendf = rdf)
} else ans$r.squared <- ans$adj.r.squared <- 0
What is the justification for this formula @bwiernik ? I haven't come across it before...
See this great StackExchange answer https://stats.stackexchange.com/a/26205/364001
Problem
R2 is affected by the absence of an intercept:
Same parameters…
Same dfs…
Same predictions…
But…
This isn’t an issue in performace, but R seems to give different values based on whether or not there is an intercept in the model
Created on 2022-10-14 by the reprex package (v2.0.1)
Solution
Perhaps we can add an
r2_prediction
which computes the R2 values based on the prediction?Either as: $R^2 = r_{\hat{y},y}^2$
Or (my preferred method) as $R^2 = 1 - \frac{Var(y - \hat{y})}{Var(y)}$
This will allow for correct R2 for transformed outcomes (using smearing), for non-linear models, etc...