[Feature Request] stddev statistic

jamm1985 commented 10 months ago

It would be a good to additionally calculate the sample variance within the average of the metrics.

Like ndcg@50: 11, stddev: 2.8

AmenRa commented 10 months ago

Hi! Yes, I should add it.

In the mean time, you can compute it as follows:

from ranx import Qrels, Run, evaluate
import numpy as np

qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
               "q_2": { "d_11": 6, "d_22": 1 } }

run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
                      "d_36": 0.6, "d_32": 0.5, "d_35": 0.4  },
             "q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
                      "d_36": 0.6, "d_22": 0.5, "d_35": 0.4  } }

qrels = Qrels(qrels_dict)
run = Run(run_dict)

evaluate(qrels, run, ["map@5", "mrr"])

print(np.std(list(run.scores["map@5"].values())))
print(np.std(list(run.scores["mrr"].values())))

AmenRa commented 9 months ago

Added support for standard deviation in v0.3.19.

Example:

from ranx import Qrels, Run, evaluate

qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
               "q_2": { "d_11": 6, "d_22": 1 } }

run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
                      "d_36": 0.6, "d_32": 0.5, "d_35": 0.4  },
             "q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
                      "d_36": 0.6, "d_22": 0.5, "d_35": 0.4  } }

qrels = Qrels(qrels_dict)
run = Run(run_dict)

evaluate(qrels, run, ["map@5", "mrr"], return_std=True)

Output:

{'map@5': {'mean': 0.6416666666666666, 'std': 0.19166666666666662},
 'mrr': {'mean': 0.75, 'std': 0.25}}

Metrics standard deviations can later be accessed as follows:

run.std_scores

Output:

{'map@5': 0.19166666666666662, 'mrr': 0.25}

Please, consider giving ranx a star if you haven't yet. :)

jamm1985 commented 9 months ago

@AmenRa thank you!

AmenRa / ranx

[Feature Request] stddev statistic #57