[BUG] ttest_rel() got an unexpected keyword argument 'alternative' when using compare with stat_test="student"

PaulLerner commented 2 years ago

Describe the bug Hi, I’m having an error when using compare with stat_test="student" (no problem when using the default "fisher").

TypeError                                 Traceback (most recent call last)
<ipython-input-3-0369b81922de> in <module>
      7     metrics=["map@100", "mrr@100", "ndcg@10"],
      8     stat_test="student",
----> 9     max_p=0.01  # P-value threshold
     10 )

/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/meta/compare.py in compare(qrels, runs, metrics, stat_test, n_permutations, max_p, random_seed, threads, rounding_digits, show_percentages)
    100         n_permutations=n_permutations,
    101         max_p=max_p,
--> 102         random_seed=random_seed,
    103     )
    104 

/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/statistical_tests/__init__.py in compute_statistical_significance(model_names, metric_scores, stat_test, n_permutations, max_p, random_seed)
     81                         n_permutations,
     82                         max_p,
---> 83                         random_seed,
     84                     )
     85 

/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/statistical_tests/__init__.py in _compute_statistical_significance(control_metric_scores, treatment_metric_scores, stat_test, n_permutations, max_p, random_seed)
     38         elif stat_test == "student":
     39             p_value, significant = paired_student_t_test(
---> 40                 control_metric_scores[m], treatment_metric_scores[m], max_p,
     41             )
     42 

/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/statistical_tests/paired_student_t_test.py in paired_student_t_test(control, treatment, max_p)
     11 
     12     """
---> 13     _, p_value = ttest_rel(control, treatment, alternative="two-sided")
     14 
     15     return p_value, p_value <= max_p

TypeError: ttest_rel() got an unexpected keyword argument 'alternative'

To Reproduce

In [1]: from ranx import Qrels, Run
   ...: 
   ...: qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },                                                                                                                                                   
   ...:                "q_2": { "d_11": 6, "d_22": 1 } }
   ...: 
   ...: run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
   ...:                       "d_36": 0.6, "d_32": 0.5, "d_35": 0.4  },
   ...:              "q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
   ...:                       "d_36": 0.6, "d_22": 0.5, "d_35": 0.4  } }
   ...:                                                                                                                                                                                                   
   ...: qrels = Qrels(qrels_dict)
   ...: run = Run(run_dict)
In [2]: from ranx import compare                                                                                                                                                                           
   ...:                                                                                                                                                                                                    
   ...: # Compare different runs and perform statistical tests                                                                                                                                             
   ...: report = compare(                                                                                                                                                                                  
   ...:     qrels=qrels,                                                                                                                                                                                   
   ...:     runs=[run, run],                                                                                                                                                                               
   ...:     metrics=["map@100", "mrr@100", "ndcg@10"],                                                                                                                                                     
   ...:     stat_test="student",                                                                                                                                                                           
   ...:     max_p=0.01  # P-value threshold
   ...: )

Versions ranx==0.2.8

AmenRa commented 1 year ago

Hi, I suspect your SciPy version is lower than 1.6.0 (see here).

I will explicit the required SciPy version in ranx's setup file so that this will not happen again.

In the meantime, updating your SciPy installation should fix that.

PaulLerner commented 1 year ago

Oh, I didn’t see that ttest_rel came from scipy. but my scipy version is 1.7.2 though…

PaulLerner commented 1 year ago

oops, no it looks like you’re right, my scipy version is 1.5.3, my pip got messed up

PaulLerner commented 1 year ago

Fixed with scipy==1.7.3, thanks for your help :)

AmenRa / ranx

[BUG] ttest_rel() got an unexpected keyword argument 'alternative' when using compare with stat_test="student" #25