JuliaStats / HypothesisTests.jl

Hypothesis tests for Julia
MIT License
295 stars 87 forks source link

Pearson Chi2 test with two vectors #277

Closed Moelf closed 11 months ago

Moelf commented 2 years ago

https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Calculating_the_test-statistic

what's the easiest way to perform this pearson chi2 test? image

wildart commented 2 years ago

Like that sum((o-e)*(o-e)/e for (o,e) in zip(O,E))?

Moelf commented 2 years ago

Yeah, and preferably also automatically give reduced chi2

wildart commented 2 years ago

Wouldn't you just divide this sum by number of observations?

Moelf commented 2 years ago

Yeah but right now we don't have this particular Pearson chi2

wildart commented 2 years ago

Well, Person chi-squared test for goodness of fit (the one you mentioned in the top) implementation does exists. You only need to convert expected counts into proportions and use two-parameter call ChisqTest(O, pᵢ), e.g.

# From https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Fairness_of_dice

O = [5,8,9,8,10,20]   # observed counts 
E = fill(10, 6)       # expected counts = 10
pᵢ = E./sum(E)        # get proportions

ChisqTest(O, pᵢ)

Which gives the following result


julia> t = ChisqTest(O, pᵢ)
Pearson's Chi-square Test
-------------------------
Population details:
    parameter of interest:   Multinomial Probabilities
    value under h_0:         [0.166667, 0.166667, 0.166667, 0.166667, 0.166667, 0.166667]
    point estimate:          [0.0833333, 0.133333, 0.15, 0.133333, 0.166667, 0.333333]
    95% confidence interval: [(0.0, 0.2111), (0.01667, 0.2611), (0.03333, 0.2777), (0.01667, 0.2611), (0.05, 0.2944), (0.2167, 0.4611)]

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           0.0199

Details:
    Sample size:        60
    statistic:          13.400000000000002
    degrees of freedom: 5
    residuals:          [-1.58114, -0.632456, -0.316228, -0.632456, 0.0, 3.16228]
    std. residuals:     [-1.73205, -0.69282, -0.34641, -0.69282, 0.0, 3.4641]

julia> t.stat / t.df # reduced Chi^2
2.6800000000000006
Moelf commented 2 years ago

I see, so maybe a possible change is to instead of: https://github.com/JuliaStats/HypothesisTests.jl/blob/be980f3ca89908cf63e60307287fe9fad02c47ad/src/power_divergence.jl#L311

just auto normalize?

wildart commented 2 years ago

It's possible to add this ChisqTest(x::AbstractVector{T}, y::AbstractVector{T}) where {T<:Integer} override for counts and calculate proportions in it.

Moelf commented 2 years ago

Unfortunately our expected is floating points number

wildart commented 2 years ago

Then you have no other way as normalize it. Another option would be a keyword parameter for proportions.

nalimilan commented 2 years ago

Can you explain your use case a bit more? I understand you have observed and expected counts and just want to make the Chi2 test from that?

The ChiSquaredTest constructor should really be improved. theta0 should be a keyword argument like for PowerDivergenceTest, otherwise the risk of confusion with y it too high.

I'm reluctant to normalize theta0 automatically, as it could hide bugs, but we could add an argument to enable that as @wildart said, and accept any vector of numbers. FWIW R's chisq.test has p and rescale.p arguments for that.

Moelf commented 11 months ago

this is fixed