cdowd / twosamples

Fast Permutation Based Two Sample Tests
https://twosampletest.com
17 stars 4 forks source link

One-sided alternatives of stat functions? #7

Open sircosine opened 4 years ago

sircosine commented 4 years ago

Hi,

I wonder if it would be useful to add an option for one-sided statistics functions? That would be similar in spirit to the alternative param of the t.test function, with the two.sided option being the default.

There are about half a dozen places in the code where the sign of the difference between two CDFs is corrected:

    if (height < 0.0) {
      height = -1.0*height;
    }

whereas In the one-sided alternative the negative height would be simply set to zero.

Any thoughts?

Thanks!

cdowd commented 4 years ago

Hi sircosine.

I appreciate the suggestion. I should start by mentioning that I'm in the thick of my dissertation writing at the moment, and so it is unlikely I will release a new version before December. Thus, if this is something you need in the short term, I suggest you just copy my functions and change them as desired.

I'll follow that by saying I don't think I fully understand your notation. I typically would describe all of the tests I've included as one-sided tests (see the README for an example). Because all of the test statistics are non-negative, all the p-values are calculated by looking at the portion of the null distribution that is larger than the observed value (to the right side). Looking at the proportion that is smaller (on the left-side) for most of these test stats would correspond to testing some hyper-regularity condition (i.e., imagine a re-ordered sequence vs. draws from a uniform -- one will be more 'regularly' spaced). So in that sense, all these test statistics are 'one-sided,' and two-sided alternatives are for somewhat unusual questions. (and could be calculated by subtracting the reported pvalues from 1).

More importantly, though, I'm struggling to understand the situation in which you would want to do this. Like a T-test, perhaps when you expect a mean-shift in one direction, you would gain power by making this change, but in that case, you are also better off doing a T-test. Is the notion that you think there is a mean shift in some direction, but there could also be other distributional changes? You want to gain the power from switching to a uni-directional mean shift test, but want the ability to detect other distributional changes of the CDF tests? It's worth noting here that ditching half of the CDF differences will undermine the ability to detect other distributional changes. And the power against a mean+variance shift may well be less than just of the standard cdf tests.

I'm open to it. It's not crazy that this could be useful, but if you could describe the use case to me, that would go a long way towards selling me on it.

Best Connor

sircosine commented 4 years ago

Hi,

Thanks for your comments. The use case I had in mind (and should have explained better) was indeed the one where a mean-shift only in one direction is interesting, while the other is ignorable. Or, visually, in a PDF plot, I would care only about measuring how much the probability mass moved to the right, but ignore the part that moved to the left. But you are right, maybe twosamples isn't the right place for this, so please feel free to close off this issue as infeasible.

Thank you anyway, and best of luck with your dissertation!

cdowd commented 4 years ago

I appreciate the luck -- I need every ounce. I don't think it is infeasible, or even wrong. I do think that mostly you're going to be better off with the t-test though. We know that the t-test is power optimal under the conditions you described.

If you're worried about a finite sample issue with your t-test, I would recommend using the permutation_test_builder function (its a twosamples internal, but the code is here) along with a simple t-test function to build a properly sized finite-sample t-test. See below for example code. Validity will require under somewhat stronger conditions than is usual for a t-test (or for the rest of this package), but you'll remove any need for asymptotics. I did initially consider including that exact test in the package as a function -- but because it operates under a fundamentally different framework than the others (testing equality of means not equality of distributions), I decided to leave it out.

t_stat = function(x,y) (mean(x)-mean(y))/sqrt(var(x)+var(y))
t_test_finite = permutation_test_builder(t_stat)
x = rnorm(100)
y = rnorm(100,1)
t_test_finite(x,y)

As an aside for possible future readers, this is a rather good demonstration of how the entire package works. There are C++ functions for calculating test stats, and then all the different test stat functions are plugged into the R function permutation_test_builder. The exchangeability condition mentioned above is implied by a combination of the null (same distribution) and independence.

XinpeiYi commented 2 years ago

Hi, I have the same problem with using the TwoSamples package. I tried to apply the TwoSamples package to the tumor-normal comparison problem. But I found I cannot get the direction from this package. I want to know whether tumor and normal have different distributions and whether tumor distribution has higher expression than normal? I think this is the reason for one-sided testing developed. KS test in R package has the one-sided alternatives choice but the TwoSamples package don't have. I think it is very easy for you to incorporate this function. Can you help me to fox it? Thanks!

cdowd commented 2 years ago

Differences

First off, if you would like to detect whether tumor and normal have different distributions, every test function in twosamples is valid and designed for that purpose. I recommend using dts_test, but the others work as well. I should mention here that your description suggests this is a well known example, but I am unfamiliar with it, and may well have missed some critical implication.

Direction

As for determining "direction", I agree from a computational perspective it is quite easy to incorporate. Each of the test stats has a line checking for "negative height", (aka the difference between ECDFs) and reversing the height in that case (as a lazy version of an absolute value). It would be straightforward to instead set the height to zero -- thus creating a one sided test -- and to swap the samples when the other side was desired. Indeed -- you are more than welcome to do so.

However, I'm not convinced this has much value. In your example, it sounds like you want to test for a difference in means -- in which case you should seriously consider using a t-test.

More broadly, interpretation of a one-sided KS test (and for the derivatives in this package) is either the same as for the two-sided test ("these are different distributions") or it is not straightforward. It does not mean that the mean is shifted in one direction or the other. E.g. the following line of code, which runs a one-sided ks test (base R version) on two normal distributions with the same mean rejects thoroughly.

stats::ks.test(rnorm(500,0,1),rnorm(500,0,2),alternative="less")

Screen Shot 2021-11-29 at 2 04 21 PM

As you can see, the one-sided ks test is picking up on the change in variance. Similarly it could pick up on a shift in skew, kurtosis, or the 1245th moment, which is not interpretable to me, but does not require a change in the mean. Indeed, the situation is worse than that example, as we can also see cases where the KS test detects "below" but the mean is in fact above.

stats::ks.test(rnorm(500,0,1),rnorm(500,0.2,2),alternative="less")

Screen Shot 2021-11-29 at 2 11 13 PM

To the best of my knowledge, this leaves you making statements like "the one-sided ks test detected a difference between the distributions" -- which is no stronger than the comparable statement implied by the two-sided ks test. But the test statistic you are using to make that (exact same) statement, is weaker -- because it can't leverage changes "on the other side" to detect these differences.

Admittedly, as the R output shows, you can strengthen that statement slightly to "the cdf of X lies below the cdf of Y somewhere", a statement the two-sided version cannot make. But I've yet to see a situation where this is a helpful statement (I'm open to examples). This claim wouldn't be enough to claim that "the tumor distribution has higher expression" -- if by "higher expression" you intend any kind of statement about means, medians, or any other summary statistic. Recapping slightly, this is because, as we saw, the difference in means may well be the reverse of the detected difference in CDFs.

Conclusion

In conclusion, as far as I can tell, the one-sided tests throw out a ton of valuable information, thus substantially reducing their power to detect differences, in order to make statements that are mostly the same, or at best incrementally stronger. And that incrementally stronger statement is stronger in ways I've never seen leveraged for any useful purpose when it is correctly described.

This leaves 3 avenues for convincing me this is a good idea:

  1. Show that the power loss is minimal. (Extremely unlikely)
  2. Convince me that my interpretation about what the stronger statement can be is wrong (reasonably likely)
  3. Show me an example where my current interpretation of that stronger statement is useful or helpful somehow above and beyond "these are different distributions". (possible, somewhat unlikely) I do intend to leave this issue open on the basis that someone may convince me on any of those lines at any point.

Further Advice

To diagnose the source of differences in KS test statistics, I suggest plotting the ECDFs. (I intend to release some better plotting tools for the package in the near-ish future based on the plots I've had to make).

plot(ecdf(rnorm(500,0.2,2),pch=NA) lines(ecdf(rnorm(500,0,1),col=2,pch=NA)

Screen Shot 2021-11-29 at 2 20 03 PM

If someone is in the position where a t-test fails to reject, but one of the functions in this package does reject, it is likely this is because what you are seeing is not a difference in means, but some other difference between the distributions. These test statistics are not better at spotting mean differences relative to a t-test (see companion paper), and most of them are worse.