How do we compare different runs with multiple folds per run?

AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

https://amenra.github.io/ranx

MIT License

427 stars 23 forks source link

How do we compare different runs with multiple folds per run? #60

Open celsofranssa opened 7 months ago

celsofranssa commented 7 months ago

How do we compare different runs with multiple folds per run? For instance, assume we have 10-folds for run_1, ... run_5?

from ranx import compare

# Compare different runs and perform Two-sided Paired Student's t-Test
report = compare(
    qrels=qrels,
    runs=[run_1, run_2, run_3, run_4, run_5],
    metrics=["map@100", "mrr@100", "ndcg@10"],
    max_p=0.01  # P-value threshold
)

celsofranssa commented 7 months ago

Also, could you provide a simple explanation of how to interpret the report?

#    Model    MAP@100    MRR@100    NDCG@10
---  -------  --------   --------   ---------
a    model_1  0.320ᵇ     0.320ᵇ     0.368ᵇᶜ
b    model_2  0.233      0.234      0.239
c    model_3  0.308ᵇ     0.309ᵇ     0.330ᵇ
d    model_4  0.366ᵃᵇᶜ   0.367ᵃᵇᶜ   0.408ᵃᵇᶜ
e    model_5  0.405ᵃᵇᶜᵈ  0.406ᵃᵇᶜᵈ  0.451ᵃᵇᶜᵈ