JuliaStats / HypothesisTests.jl

Hypothesis tests for Julia
MIT License
299 stars 87 forks source link

Classical Hypothesis Tests: Wald, LR, LM #200

Open azev77 opened 4 years ago

azev77 commented 4 years ago

@nosferican mentioned this in #121 also on discourse... I coded up some classical hypothesis tests (HT). I need help w/ the score test. Does anyone know how to obtain the loglik functions after glm?

using DataFrames, GLM, Distributions

n=7; k=1; X=randn(n,k); β=[2 ;-4]; σ_e = 10.1;
Y= [ones(n) X]*β + σ_e*randn(n);

d = DataFrame(X1= X[:,1], Y=Y);
#d = DataFrame(X1= X[:,1], X2= X[:,2], Y=Y);
#
f0 = @formula(Y ~ 1)
fA = @formula(Y ~ 1 + X1)
#
mH0 = lm(f0, d)
mHA = lm(fA, d)
#
mH0 = glm(f0, d, Normal())
mHA = glm(fA, d, Normal())

# Likelihood Ratio Test
function HT_LR(mH0, mHA)
    LR = 2.0*(loglikelihood(mHA) - loglikelihood(mH0))  |> abs
    df = dof_residual(mH0) - dof_residual(mHA)          |> abs
    pv = 1 - cdf(Chisq(df), LR)
    return LR, pv, df
end
#
HT_LR(mH0, mHA) #better practice
HT_LR(mHA, mH0)
# Wald Test
"https://en.wikipedia.org/wiki/Wald_test"
"H0: R*θ = r"
"HA: R*θ ≂̸ r"
#
function HT_Wald(mHA, R, r, V)
    θ̂ = coef(mHA)
    A = (R*θ̂ - r)
    #V = vcov(mHA)  #[2,2]
    n = size(mHA.mm.m,1)
    W = A' * inv(R*V*R') * A   #V/n
    df = size(R,1)
    pv = 1 - cdf(Chisq(df), W)
    return W, pv, df
end
R = [0 1]
r = [0]
HT_Wald(mHA, R, r, vcov(mHA))

"LM aka Score test Matlab code"
G = gradp(@nll_lin,theta_r_full,datamat,1);
H_r = HessMp(@nll_lin,theta_r_full,datamat,1);
V_r = inv(H_r);
LM = G*V_r*G';
LM_p = 1-chi2cdf(LM,2);
mschauer commented 4 years ago

Maybe something for https://discourse.julialang.org/ ?

mschauer commented 4 years ago

Top! https://discourse.julialang.org/t/classical-hypothesis-tests-wald-lr-lm/39575

azev77 commented 4 years ago

@mschauer I posted this as sample code in case someone is working on a PR. Are there plans to include Wald/LR/LM in HypothesisTests.jl?

nalimilan commented 4 years ago

See https://github.com/JuliaStats/StatsModels.jl/pull/162 for the likelihood ratio test. I think other tests for models should go to StatsModels or GLM rather than HypothesisTests, as the latter shouldn't depend on other packages.

azev77 commented 4 years ago

I agree. HypothesisTests.jl shouldn't depend on other packages. My code for LR test is for any estimated models which work w/ loglikelihood(m). The only thing it depends on is cdf(Chisq(df), LR).

# Likelihood Ratio Test
function HT_LR(mH0, mHA)
    LR = 2.0*(loglikelihood(mHA) - loglikelihood(mH0))  |> abs
    df = dof_residual(mH0) - dof_residual(mHA)          |> abs
    pv = 1 - cdf(Chisq(df), LR)
    return LR, pv, df
end
pdeffebach commented 4 years ago

I think a trinity tests package would be nice. I would be happy to contribute to this in the future.

PS i whink it would also be nice to have a @formula syntax to make this, such as @formula(x =0, y = 0) and make a matrix of restrictions base on that.

Nosferican commented 4 years ago

Aye. I have code for Wald testing and there is the hypothesis contrasts at StatsModels that could be expanded. We should work on specifying linear combinations and hypothesis tests based on those.

mschauer commented 4 years ago

Reopen, it makes sense to discuss this here

azev77 commented 4 years ago

@pdeffebach I'd prefer the classic trinity tests be in this repo.

Can we return to the conversation about where hypothesis tests should live in Julia? To me the most obvious & natural place is in a repo called "HypothesisTests.jl". Just like most distributions are in "Distributions.jl". If not, then what's the purpose of "HypothesisTests.jl"?

I feel like keeping as many hypothesis tests as possible in this repo will encourage others to contribute. It would certainly make me more likely to wanna contribute.

I don't see why dependencies are an necessarily an issue. I wrote code for an LRT that doesn't depend on anything (except Chiqsuared distribution).

nalimilan commented 4 years ago

Our plan was to move all model-related functions to anew StatsModelsBase package, which would contain the essential parts of the currents StatsModels. In that case, HypothesisTests would have to depend on StatsModelsBase to be able to call deviance and co. (In the long term StatsBase is supposed to go away in favor of Statistics + StatsModelsBase and possibly others.) Anyway I don't see the problem with having these functions in one package or another. Anyway you'll have to read the documentation for StatsModels to learn how to fit models.

Cc: @kleinschmidt

azev77 commented 4 years ago

@nalimilan this is interesting, I wanna learn more.

One advantage of having Hyp Tests together (when possible) is it makes it easier for the user to discover & compare. There are often many different tests one can use for a given hypothesis. Each has it's own advantages. For example, Wald has better power than Score, but Score has better size than Wald. Keeping tests together makes it easier to discover, compute each tests power/size etc.

Mathematica has several goodness of fit tests. You can automatically compare results:

image

Furthermore, they can automate selecting the optimal test for a given dataset DistributionFitTest[data,dist,Automatic] will choose the most powerful test that applies to data and dist for a general alternative hypothesis.

I feel like these things are easier to cooperate on if we work together in the same repo, whichever repo is decided on...

nalimilan commented 4 years ago

Yes but they could live in StatsModels.

kleinschmidt commented 4 years ago

I don't have much of an opinion one way or another. I suspect that even a trimmed down StatsModelsBase would still contain more than just the function definitions, so I could see being less willing to take it on as a dependency here. But on the other hand I'm sympathetic to the desire to keep hypothesis tests in one place, and this package is the natural place for them.

azev77 commented 4 years ago

@nalimilan @kleinschmidt The score (LM) test requires the score function & informationmatrix of a model. I've been looking around JuliaStats & found: StatsBase.informationmatrix(mHA) StatsBase.score(mHA) They currently don't work w/ GLM. Will these eventually be able to return score/info for any appropriate fitted model?

Nosferican commented 4 years ago

I added those to the API as those are necessary components for robust variance-covariance estimators vcov(model; ...). GLM just needs to implement those. I believe Econometrics.jl does implement those.

kleinschmidt commented 4 years ago

StatsModels 0.6.12 has just been released which contains the LR test (from JuliaStats/StatsModels.jl#162)

Moelf commented 2 years ago

My field uses a lot of likelihood ratio and related test statistics, Wald etc. I wonder if it's worth splitting a package out for that, for reference, see Chapter 3 of: https://arxiv.org/pdf/1007.1727.pdf

Essentially, given a Likelihood function L(θs::Vector{Float64})::Float64, our likelihood ratio would be a 1D function of parameter of interest (POI), call it μ, and assume it's the first among original θs:

const global_MLL = L(MLE_θs)

function LR(μ::Float64)
    other_θs = # conditional estimator while fixing POI μ
    L(vcat(μ, other_θs)) / global_MLL
end

then we develop test statistics t_0 and t_μ and obtain p-value and interval by integrating the value of test statistics etc.