JuliaStats / HypothesisTests.jl

Hypothesis tests for Julia
MIT License
295 stars 87 forks source link

DomainError from sqrt in ChisqTest #281

Open andreasnoack opened 1 year ago

andreasnoack commented 1 year ago

Continued from @joshday's comment https://github.com/JuliaStats/HypothesisTests.jl/issues/125#issuecomment-1310188699 in this new issue since the underlying bug is different. The reproducer is

julia> data = [
           946 2;
           1740 4;
           735 2;
           1390 3;
           203 0;
           2628 6
       ];

julia> ChisqTest(data)
Pearson's Chi-square Test
-------------------------
Population details:
    parameter of interest:   Multinomial Probabilities
    value under h_0:         [0.123501, 0.227201, 0.0960131, 0.181474, 0.0264459, 0.343146, 0.000274734, 0.000505419, 0.000213586, 0.000403697, 5.88303e-5, 0.000763344]
    point estimate:          [0.123515, 0.227184, 0.0959655, 0.181486, 0.0265048, 0.343126, 0.000261131, 0.000522261, 0.000261131, 0.000391696, 0.0, 0.000783392]
Error showing value of type PowerDivergenceTest:
ERROR: DomainError with -5.442823412516082:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
  [1] throw_complex_domainerror(f::Symbol, x::Float64)
    @ Base.Math ./math.jl:33
  [2] sqrt
    @ ./math.jl:591 [inlined]
  [3] ci_sison_glaz(x::PowerDivergenceTest, alpha::Float64; skew_correct::Bool)
    @ HypothesisTests ~/.julia/dev/HypothesisTests/src/power_divergence.jl:188
  [4] confint(x::PowerDivergenceTest; level::Float64, tail::Symbol, method::Symbol, correct::Bool, bootstrap_iters::Int64, GC::Bool)
    @ HypothesisTests ~/.julia/dev/HypothesisTests/src/power_divergence.jl:90
  [5] confint
    @ ~/.julia/dev/HypothesisTests/src/power_divergence.jl:70 [inlined]
  [6] show(_io::IOContext{Base.TTY}, test::PowerDivergenceTest)
    @ HypothesisTests ~/.julia/dev/HypothesisTests/src/HypothesisTests.jl:129
  [7] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, x::PowerDivergenceTest)
    @ Base.Multimedia ./multimedia.jl:47
  [8] (::REPL.var"#43#44"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:267
  [9] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [10] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:260
 [11] display(d::REPL.REPLDisplay, x::Any)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:272
 [12] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [13] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [14] invokelatest
    @ ./essentials.jl:726 [inlined]
 [15] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:296
 [16] (::REPL.var"#45#46"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:278
 [17] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [18] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:276
 [19] (::REPL.var"#do_respond#66"{Bool, Bool, REPL.var"#77#87"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:857
 [20] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [21] invokelatest
    @ ./essentials.jl:726 [inlined]
 [22] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
 [23] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL ~/.julia/juliaup/julia-1.8.1+0.x64/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
 [24] (::REPL.var"#49#54"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ./task.jl:484

The issue is with this function https://github.com/JuliaStats/HypothesisTests.jl/blob/a89145162648df8b37303894211c0ce7f197f065/src/power_divergence.jl#L133-L182 which is based on the SAS script from https://www.jstatsoft.org/index.php/jss/article/view/v005i06/621. I'm not sure what the issue is.

ParadaCarleton commented 1 year ago

Having the same problem.

julia> approx_contingency = [
           (2609 - 325) * .27        325 * .16;
           (2609 - 325) * (1-.27)    325 * (1-.16);
       ]
2×2 Matrix{Float64}:
  616.68   52.0
 1667.32  273.0

julia> contingency = round.(Int, approx_contingency)
2×2 Matrix{Int64}:
  617   52
 1667  273

julia> ChisqTest(contingency)
Pearson's Chi-square Test
-------------------------
Population details:
    parameter of interest:   Multinomial Probabilities
    value under h_0:         [0.224478, 0.650953, 0.0319419, 0.0926269]
    point estimate:          [0.236489, 0.638942, 0.019931, 0.104638]
Error showing value of type PowerDivergenceTest:
ERROR: DomainError with -1.4430408161224477:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).

julia> MultinomialLRTest(contingency)
Multinomial Likelihood Ratio Test
---------------------------------
Population details:
    parameter of interest:   Multinomial Probabilities
    value under h_0:         [0.224478, 0.650953, 0.0319419, 0.0926269]
    point estimate:          [0.236489, 0.638942, 0.019931, 0.104638]
Error showing value of type PowerDivergenceTest:
ERROR: DomainError with -1.4430408161224477:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
andreasnoack commented 1 year ago

If somebody with access to SAS could try out the SAS macro, it would be a useful data point to know if the SAS macro fails in the same way or if our translation has introduced the issue.

ParadaCarleton commented 1 year ago

If somebody with access to SAS could try out the SAS macro, it would be a useful data point to know if the SAS macro fails in the same way or if our translation has introduced the issue.

I don't have access to SAS, but if the specific confidence procedure isn't critical here, I think a Bayesian credible interval with an uninformative prior would be dramatically simpler and work just as well, if not better. The solution is just a conjugate Dirichlet distribution.

devmotion commented 1 year ago

In general, credible intervals are no confidence intervals and don't satisfy any coverage guarantees.

ParadaCarleton commented 1 year ago

In general, credible intervals are no confidence intervals and don't satisfy any coverage guarantees.

Not in general, but they do asymptotically for simple problems like this one. For binomial confidence intervals there's not really any popular interval constructions with exact coverage guarantees; all popular intervals are based on the Central Limit theorem. The uninformative prior credible interval is widely-used here, even by frequentists, because the frequentist coverage of the procedure is good even for very small samples--see e.g. this paper.