FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
225 stars 46 forks source link

Add loglikelihood and nullloglikelihood #239

Closed junder873 closed 1 year ago

junder873 commented 1 year ago

Add StatsAPI functions related to loglikelihood and nullloglikelihood. These are necessary for computing some R-squared values (McFadden, etc.), so those are also tested.

The only breaking change in this is a change to dof. Currently, the reported dof is the number of parameters in the model. In order to correctly compute some of the adjusted R-squared values based on loglikelihood, this needs to be the total number of coefficients in the model including the fixed effects

matthieugomez commented 1 year ago

Can you give more details about `` In order to correctly compute some of the adjusted R-squared values based on loglikelihood, this needs to be the total number of coefficients in the model including the fixed effects''? Why does this require to define dof this way? Should not we make it easier to access dof_fes instead?

junder873 commented 1 year ago

I am not completely clear on what the original dof was. I understand it was used to calculate the F-stat, but the actual value always seemed confusing to me since it was just the dof lost due to the formula but not the fixed effects. So, I admit that part of the change was to try and make the number more in line with what I think it should be.

However, the main reason to change the value is the new value conforms to what StatsAPI.jl seems to expect. Specifically, in the adjr2 definition (line 300), it asks for dof(model), to get the right adjr2, the dof I defined here is necessary.

So an alternative that would not be breaking would be to define something like:

function StatsAPI.adjr2(model::FixedEffectModel, variant::Symbol)
    k = dof(model) + dof_fe(model)
    if variant == :McFadden
        ll = loglikelihood(model)
        ll0 = nullloglikelihood(model)
        1 - (ll - k)/ll0
    elseif variant == :devianceratio
        n = nobs(model)
        dev  = deviance(model)
        dev0 = nulldeviance(model)
        1 - (dev*(n-1))/(dev0*(n-k))
    else
        throw(ArgumentError("variant must be one of :McFadden or :devianceratio"))
    end
end

in this package and add some ability to access the dof due to fixed effects.

matthieugomez commented 1 year ago

Ok, thanks. Stata areg and reghdfe only report the number of non-fixedeffects coefficients in the degree of freedom, so I'm wary to change it. I would prefer to rewrite adjr2 in the way you suggest, as well as add and export a function dof_fes which returns the number of fixed effects.

junder873 commented 1 year ago

Done.

I also slightly adjusted how adjr2 is calculated since it was affected by clustering. The new numbers correspond to what Stata provides (so I also added a test for it).

matthieugomez commented 1 year ago

Thanks a lot.

  1. Is your formula for adjusted r2 still valid when the formula does not have an intercept?
  2. Any reason to use dof_fe instead of dof_fes for the name of the exported functions?
junder873 commented 1 year ago
  1. I think related to #179 and #173, if there is truly no intercept then this package reports very different r2 (and adjr2) values. However, with fixed effects, explicitly removing the intercept does not change the results.
  2. I do not have a preference, so I changed the function to dof_fes. It is also not currently exported (do you want it to be?)

I also realized there was a small error in how it was being, for some reason two different computers gave different adjr2 values, but updating packages seems to have solved it.

matthieugomez commented 1 year ago

Ok. Yes, please export it.

If you have time, it would be nice to also remove the computation of r2 r2adj from the fit.jl file and define two functions StatsAPI.r2(m::FixedEffectModel, variant::Symbol = :devianceratio) and StatsAPI.adjr2(m::FixedEffectModel, variant::Symbol = :devianceratio) (compared to your PR, add StatsAPI.r2 + default argument for variant). Just let me know if you can do it — I'll merge the PR either way.

junder873 commented 1 year ago

Added those features. One change I was not sure how to add was the dof_add optional argument. This is not saved and originally only affected adjr2, I included it in the main dof, but this would affect the variance calculations.

I also had to implement a little more of the R2/Adjr2 from StatsAPI since this package defines deviance=tss while StatsAPI expects deviance=rss

matthieugomez commented 1 year ago

Thanks! Yeah you can remove dof_add. I don’t think anyone used it.

The second part is a mistake, deviance should follow StatsAPI definition. Could you correct it?

junder873 commented 1 year ago

Made those fixes. I left in dof_add as a keyword argument to prevent breaking someone's code if they assign it, but it does nothing now.

matthieugomez commented 1 year ago

Great! It looks like there is a conflict with master due to the last pull request (which exports dof, r², and adjr²) which prevents me to merge yours. Can you solve it?

junder873 commented 1 year ago

Should be resolved now.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 93.93% and project coverage change: +0.48 :tada:

Comparison is base (57694b7) 95.57% compared to head (d6886af) 96.06%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #239 +/- ## ========================================== + Coverage 95.57% 96.06% +0.48% ========================================== Files 8 8 Lines 610 635 +25 ========================================== + Hits 583 610 +27 + Misses 27 25 -2 ``` | [Impacted Files](https://app.codecov.io/gh/FixedEffects/FixedEffectModels.jl/pull/239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [src/FixedEffectModels.jl](https://app.codecov.io/gh/FixedEffects/FixedEffectModels.jl/pull/239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL0ZpeGVkRWZmZWN0TW9kZWxzLmps) | `100.00% <ø> (ø)` | | | [src/partial\_out.jl](https://app.codecov.io/gh/FixedEffects/FixedEffectModels.jl/pull/239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL3BhcnRpYWxfb3V0Lmps) | `95.58% <ø> (ø)` | | | [src/FixedEffectModel.jl](https://app.codecov.io/gh/FixedEffects/FixedEffectModels.jl/pull/239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL0ZpeGVkRWZmZWN0TW9kZWwuamw=) | `93.24% <93.10%> (+2.99%)` | :arrow_up: | | [src/fit.jl](https://app.codecov.io/gh/FixedEffects/FixedEffectModels.jl/pull/239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2ZpdC5qbA==) | `96.34% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

matthieugomez commented 1 year ago

Thanks!