0.2.8 much slower than 0.2.7

DizietAsahi commented 4 years ago

I updated to v.0.2.8 today, and I noticed my code to be much slower than before. This seem to be related to the inclusion of the lqrt test in the results.

test 1: virtual env with python 3.7.5 pandas 0.24.0 dabest 0.2.7
```
import numpy as np
import pandas as pd
import dabest
```

np.random.seed(1234) df = pd.DataFrame({'Group1':np.random.normal(loc=0, size=(1000,)), 'Group2':np.random.normal(loc=1, size=(1000,))}) test = dabest.load(df, idx=['Group1','Group2']) %time print(test.mean_diff)


> DABEST v0.2.7
> =============
>              
> Good morning!
> The current time is Tue Dec 31 11:46:00 2019.
> 
> The unpaired mean difference between Group1 and Group2 is 1.03 [95%CI 0.941, 1.11].
> The two-sided p-value of the Mann-Whitney test is 2.63e-97.
> 
> 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
> The p-value(s) reported are the likelihood(s) of observing the effect size(s),
> if the null hypothesis of zero difference is true.
> 
> To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
> 
> 
> CPU times: user 558 ms, sys: 5.83 ms, total: 564 ms
> **Wall time: 564 ms**

- test 2: virtual env with python 3.7.5 pandas 0.25.3 dabest 0.2.8

import numpy as np import pandas as pd import dabest

np.random.seed(1234) df = pd.DataFrame({'Group1':np.random.normal(loc=0, size=(1000,)), 'Group2':np.random.normal(loc=1, size=(1000,))}) test = dabest.load(df, idx=['Group1','Group2']) %time print(test.mean_diff)



> DABEST v0.2.8
> =============
>              
> Good morning!
> The current time is Tue Dec 31 11:47:09 2019.
> 
> The unpaired mean difference between Group1 and Group2 is 1.03 [95%CI 0.941, 1.11].
> The two-sided p-value of the Mann-Whitney test is 2.63e-97.
> 
> 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
> The p-value(s) reported are the likelihood(s) of observing the effect size(s),
> if the null hypothesis of zero difference is true.
> 
> To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
>
>
> CPU times: user 2.46 s, sys: 8.69 ms, total: 2.47 s
> **Wall time: 2.47 s**

Would it be possible to delay doing the statistical tests to when `effect_size.statistical_tests` is called instead of calculating all the tests a priori?

josesho commented 4 years ago

Hi @DizietAsahi , thanks for the suggestion. I'll see what refactoring needs to be done, otherwise happy to accept a PR from you.

Ideally, the Lq-RT tests should utilise efficient bootstrapping (that is used here); this performance deficit suggests they don't. I wonder if it might be worth refactoring from the original lqrt package such that the test results are derived from the bootstraps already generated as part of other calculations....

josesho commented 4 years ago

Looping @adam2392 in.

For v0.2.9, I will implement approximate permutation tests (aka Monte Carlo permutation tests) as the default reported test when dabest_object.mean_diff is called.

I agree with @DizietAsahi that with high Ns, or with several pairwise comparisons, the Lq-RT Python implement bogs down the computational time unnecessarily. (As seen in the above example, there is a 2x increase in computational time. With 6 pairwise comparisons, there is a 10x increase.)

Therefore, I am strongly inclined to removing the Lq-RT feature in v0.2.9 because:

I don't have the time to optimise the lqrt code, namely because I don't fully comprehend the algorithm
Lq-RT (for now) remains very obscure. An aside: This is hopefully a chicken-egg problem that will be solved when more practitioners adopt the method, who might be incentivied to optimise of the current code.

Lq-RT tests can still be used together with the dabest package, with a little bit of custom code, to output the desired statistics in parallel. @adam2392, feel free to DM me if you need help on that.

adam2392 commented 4 years ago

Looping @adam2392 in.

For v0.2.9, I will implement approximate permutation tests (aka Monte Carlo permutation tests) as the default reported test when dabest_object.mean_diff is called.

I agree with @DizietAsahi that with high Ns, or with several pairwise comparisons, the Lq-RT Python implement bogs down the computational time unnecessarily. (As seen in the above example, there is a 2x increase in computational time. With 6 pairwise comparisons, there is a 10x increase.)

Therefore, I am strongly inclined to removing the Lq-RT feature in v0.2.9 because:

Okay that sounds okay especially given the fact that there most likely can be some optimizations to be done.

Lq-RT tests can still be used together with the dabest package, with a little bit of custom code, to output the desired statistics in parallel. @adam2392, feel free to DM me if you need help on that.

Yes this would be helpful. Can you show me how to do that?

I would imagine perhaps we can even build an API interface for adding any additional "hypothesis test" via a lambda function?

adam2392 commented 4 years ago

I can help improve the lqrt testing perhaps in version 0.3.x+? I am a bit strapped down right now.

I do think more robust statistics down the line would be helpful to everyone since the package is so nice to use :)

josesho commented 4 years ago

@adam2392 , I think a utility function can be bundled in for v0.2.9, stay tuned! Closing this for now.

ACCLAB / DABEST-python

0.2.8 much slower than 0.2.7 #91