JuliaStats / HypothesisTests.jl

Hypothesis tests for Julia
MIT License
298 stars 87 forks source link

Tests for RegressionModel #121

Open Nosferican opened 6 years ago

Nosferican commented 6 years ago

I wanted to know if this package would be a good place to develop a suite of tests for Regression Models or if it should be a different one.

The idea would be to provide a few tests such as:

struct WaldTest <: HypothesisTest
    value::Float64
    dist::FDist
    function WaldTest(model::StatsBase.RegressionModel;
                      restrictions::Union{Matrix,Vector{Symbol}} = coefnames(model))
    ...
end

These tests would make use of the methods provided by StatsBase as much as possible. A few that might be implemented are:

Hypothesis tests

[ ] Wald test [ ] Lagrange multiplier test (LM test) [ ] Likelihood ratio test (LR test) [ ] Sargan Hansen test [ ] Durbin Wu Hausman test [ ] Chow test

Whenever possible it would compute the robust version.

andreasnoack commented 6 years ago

I don't know if the scope of this package has ever been defined but I've typically thought of it as a collection of functions for testing hypotheses on fairly "simple" parameters. Mostly related the mean. However, the time series section doesn't really follow this "rule" and is also more econometrics inspired. If we add regression tests to this package, a regression package (most likely GLM) should probably be added as a dependency so maybe this is not the right place for these.

An alternative could be to make a more batteries-included econometrics package with all of the typical econometrics textbook tools handy. Because, while I believe 1-3 are generally used across branches of statistics (although 2 under a different name), I haven't seen 4-6 used outside econometrics.

Nosferican commented 6 years ago

Well the idea is to have them work for any StatsBase.RegressionModel so hopefully no regression package will be added as a dependency. Packages would just have to implement the methods in StatsBase and maybe one or two additional methods at most. I could probably write a package for RegressionTests and host the regression / econometrics / time series ones. Those when applicable could call methods defined here if suitable.

Would it possible to maybe defined testname, pvalue in StatsBase and maybe a hierarchy there for tests if it makes sense?

nalimilan commented 6 years ago

I don't think that's the right package. At least for tests as general as LRT and Wald, I think it would be fine to have them in StatsBase, or maybe in StatsModels if we decide it's not just about translating formulas into model matrices. For others, I'm not sure. Could they be defined with only a limited set of methods part of the RegressionModel interface?

Nosferican commented 6 years ago

I believe the tests could be structs that take a RegressionModel (and maybe some keyword arguments) as constructors and have two fields: value and a distribution and other values or methods for summary data (testname, restrictions, null hypothesis, etc.). The regression tests usually run some regression and then test some restrictions which result in some Wald Test or LR test. Thus StatsBase could have the hypothesis tests (first three) and regression tests are wrappers that construct one of the basic ones. Same for Breush-Pagan and a few others already implemented.

nalimilan commented 6 years ago

Yes, but what methods do they need the input model object to implement?

Also see how the F-test has been implemented recently in GLM: https://github.com/JuliaStats/GLM.jl/pull/182. We should probably take a similar approach for the LRT, while other tests could work differently since there are no models to compare IIUC.

Nosferican commented 6 years ago

I would have to work a bit on those and get back to you. For most I believe modelmatrix, residuals, dof_residual, coef, coefnames, vcov, deviance, loglikelihood, etc... for other models we might have to use feed it multiple RegressionModel. I implemented Wald test before GLM got theirs so I haven’t check theirs yet. For all tests I tend to go for the robust version (usually à la Wooldridge). In many cases it is a generalization so they coincide if one uses the spherical error and independence assumptions, and correct when using a robust vcov.

Nosferican commented 6 years ago

@lbittarello I saw that your package also had a Hausman test implementation.