Closed junder873 closed 1 year ago
Can you give more details about `` In order to correctly compute some of the adjusted R-squared values based on loglikelihood, this needs to be the total number of coefficients in the model including the fixed effects''? Why does this require to define dof this way? Should not we make it easier to access dof_fes instead?
I am not completely clear on what the original dof was. I understand it was used to calculate the F-stat, but the actual value always seemed confusing to me since it was just the dof lost due to the formula but not the fixed effects. So, I admit that part of the change was to try and make the number more in line with what I think it should be.
However, the main reason to change the value is the new value conforms to what StatsAPI.jl seems to expect. Specifically, in the adjr2 definition (line 300), it asks for dof(model)
, to get the right adjr2, the dof I defined here is necessary.
So an alternative that would not be breaking would be to define something like:
function StatsAPI.adjr2(model::FixedEffectModel, variant::Symbol)
k = dof(model) + dof_fe(model)
if variant == :McFadden
ll = loglikelihood(model)
ll0 = nullloglikelihood(model)
1 - (ll - k)/ll0
elseif variant == :devianceratio
n = nobs(model)
dev = deviance(model)
dev0 = nulldeviance(model)
1 - (dev*(n-1))/(dev0*(n-k))
else
throw(ArgumentError("variant must be one of :McFadden or :devianceratio"))
end
end
in this package and add some ability to access the dof due to fixed effects.
Ok, thanks. Stata areg
and reghdfe
only report the number of non-fixedeffects coefficients in the degree of freedom, so I'm wary to change it. I would prefer to rewrite adjr2 in the way you suggest, as well as add and export a function dof_fes which returns the number of fixed effects.
Done.
I also slightly adjusted how adjr2 is calculated since it was affected by clustering. The new numbers correspond to what Stata provides (so I also added a test for it).
Thanks a lot.
I also realized there was a small error in how it was being, for some reason two different computers gave different adjr2 values, but updating packages seems to have solved it.
Ok. Yes, please export it.
If you have time, it would be nice to also remove the computation of r2 r2adj from the fit.jl file and define two functions StatsAPI.r2(m::FixedEffectModel, variant::Symbol = :devianceratio) and StatsAPI.adjr2(m::FixedEffectModel, variant::Symbol = :devianceratio) (compared to your PR, add StatsAPI.r2 + default argument for variant). Just let me know if you can do it — I'll merge the PR either way.
Added those features. One change I was not sure how to add was the dof_add
optional argument. This is not saved and originally only affected adjr2, I included it in the main dof, but this would affect the variance calculations.
I also had to implement a little more of the R2/Adjr2 from StatsAPI since this package defines deviance=tss while StatsAPI expects deviance=rss
Thanks! Yeah you can remove dof_add. I don’t think anyone used it.
The second part is a mistake, deviance should follow StatsAPI definition. Could you correct it?
Made those fixes. I left in dof_add as a keyword argument to prevent breaking someone's code if they assign it, but it does nothing now.
Great! It looks like there is a conflict with master due to the last pull request (which exports dof, r², and adjr²) which prevents me to merge yours. Can you solve it?
Should be resolved now.
Patch coverage: 93.93
% and project coverage change: +0.48
:tada:
Comparison is base (
57694b7
) 95.57% compared to head (d6886af
) 96.06%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thanks!
Add StatsAPI functions related to loglikelihood and nullloglikelihood. These are necessary for computing some R-squared values (McFadden, etc.), so those are also tested.
The only breaking change in this is a change to dof. Currently, the reported dof is the number of parameters in the model. In order to correctly compute some of the adjusted R-squared values based on loglikelihood, this needs to be the total number of coefficients in the model including the fixed effects