JuliaStats / HypothesisTests.jl

Hypothesis tests for Julia
Other
292 stars 87 forks source link

one-sided MannWhitneyUTest #315

Closed gdmcbain closed 4 months ago

gdmcbain commented 4 months ago

MannWhitneyUTest doesn't have an option for other than the two-sided test, unlike scipy.stats.mannwhitneyu, which has an optional keyword argument alternative: {'two-sided', 'less', 'greater'}.

I noticed this when attempting to reproduce the example from §9.2.4 ‘Example: Rank-Sum Test for Data with One Reporting Limit’ of

The example concerns "censored" data, in that some measurements fall below a detection limit. Those measurements are treated, for the purpose of applying an ordinal test, by replacing them with the detection limit; i.e., they're all deemed tied. This isn't the issue here; that's easy:

struct LT
  x::Float64
end

Censorable = Union{Float64,LT}

groups = Vector{Censorable}[
  [LT(0.2), 1.5, LT(0.2), LT(0.2)],
  [3.4, 1.9, 3.7, 2.1, 3.2, 2.4, 1.2, 4.1, 1.9, 0.6]
]

highest_reporting_limit(g::Vector{Censorable}) = maximum(q.x for q in g if isa(q, LT))

function HypothesisTests.MannWhitneyUTest(x::Vector{Censorable}, y::Vector{Censorable})
  joint = vcat(x, y)
  limit = highest_reporting_limit(joint)
  x1, y1 = [[((isa(z, LT) || z <= limit) ? limit : z) for z in g] for g in [x, y]]
  MannWhitneyUTest(x1, y1)
end

@show MannWhitneyUTest(groups...)

This replaces the given censored data groups with:

x1 = [0.2, 1.5, 0.2, 0.2]
y1 = [3.4, 1.9, 3.7, 2.1, 3.2, 2.4, 1.2, 4.1, 1.9, 0.6]]

and then MannWhitneyUTest(x1, y1) reports:

Approximate Mann-Whitney U test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          -2.05

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0128

Details:
    number of observations in each group: [4, 10]
    Mann-Whitney-U statistic:             2.0
    rank sums:                            [12.0, 93.0]
    adjustment for ties:                  30.0
    normal approximation (μ, σ):          (-18.0, 7.03211)

i.e., a p-value of 0.0128, whereas the text gives 0.0064. I noticed:

In SciPy:

for alt in ['two-sided', 'less', 'greater']:                                                                  lt)
     print(mannwhitneyu([0.2, 1.5, 0.2, 0.2], [3.4, 1.9, 3.7, 2.1, 3.2, 2.4, 1.2, 4.1, 1.9, 0.6], alternative=alt))
MannwhitneyuResult(statistic=2.0, pvalue=0.012825255684392021)
MannwhitneyuResult(statistic=2.0, pvalue=0.006412627842196011)
MannwhitneyuResult(statistic=2.0, pvalue=0.9957406660823311)

so it looks like Helsel's matches the alternative='less'. Rereading Helsel (p. 159):

Of interest is whether concentrations are higher in the wells affected by irrigation. …

… The Mann–Whitney (or rank-sum) test can be easily applied to these data, without any chnages. This should be set up as a one-sided test—the question was “Are concentrations higher in wells affected by irrigation?”—so a difference in only one direction is of interest. The results below show that DOC concentrations are higher in irrigation-influenced wells, with a p-value of 0.0064.

HypothesisTests.MannWhitneyUTest does correctly give the p = 0.0520 for the earlier (two-sided) example in §9,2,2,

Should HypothesisTests.MannWhitneyUTest be extended with somethink like SciPy's alternative keyword option?

gdmcbain commented 4 months ago

I think R's equivalent is wilcox.test; it also has

alternative | a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".

gdmcbain commented 4 months ago

Oh, I see! It's already there! It's just a matter of passing tail=:right to pvalue with the result.

x = [0.2, 1.5, 0.2, 0.2]
y = [3.4, 1.9, 3.7, 2.1, 3.2, 2.4, 1.2, 4.1, 1.9, 0.6]
for tail ∈ [:both, :left, :right]
    @show pvalue(MannWhitneyUTest(x, y), tail=tail)
end

gives

pvalue(MannWhitneyUTest(x, y), tail = tail) = 0.012825255684392028
pvalue(MannWhitneyUTest(x, y), tail = tail) = 0.006412627842196014
pvalue(MannWhitneyUTest(x, y), tail = tail) = 0.9957406660823311

Thank you.