Closed clintonTE closed 3 years ago
In the absence of intercept, FixedEffectModels.jl
computes the r2 as 1 - sum(ϵ.^2)/sum(y.^2)
, which gives 0.9155.
This is similar to what Stata does.
That seems more an artifact of how Stata handles the command in the no-fixed effects scenario than anything else, though I can't confirm without a license. It's neither the explained variance nor the squared correlation of the predictions (without fixed effects)- That is, it will be right with fixed effects (because of the demeaning) but wrong without them. Why not give the correct answer in both cases?
There is no right or wrong, it's just that there are different definitions of what R2 means in a model without intercept.
See: https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo or https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-why-are-r2-and-f-so-large-for-models-without-a-constant/
I see what you mean, thanks for the links.
Still disconcerting though to get a different answer with a manual intercept column. Not sure what the most intuitive way to handle it would be, maybe treat a column of 1s as a special case and/or flash a warning on the behavior if there are no fixed effects.
True. Just checked and the same issue happens in Stata, so not sure what to do.
I start to understand more now...
Possible choices:
a) Just document the behaviour. That would make clear what is going on.
b) Possibly change the API (but not behaviour)? If you are using the r2()
from statsmodels it should behave similarly like the r2(). Users would benefit from signaling in the API that the stata derived r2() behaves quite differently to GLM or the R lm package. This is what caused my misunderstanding.
c) start changing behaviour of the software. i.e. Use the R lm calculation for no mean.
r2 is wrong when the model has no fixed effects or intercept.
With respect to the use case, when running a mixture of fixed effects specifications and specifications with an intercept only, I find it is often easiest to programmatically account for the intercept as its own column.