AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
427 stars 23 forks source link

[Feature Request] from ranx import Report #16

Closed PaulLerner closed 2 years ago

PaulLerner commented 2 years ago

Is your feature request related to a problem? Please describe. Hi! I’d like to be able to import Report so that I can load a previously saved report (output of compare) and tweak the runs.

Describe the solution you'd like from .report import Report

Describe alternatives you've considered Re-run compare with different runs :sweat_smile:

PaulLerner commented 2 years ago

mmh actually it’s not straightforward to load the saved report into a new one. it would need something like a from_dict method that reverses to_dict (https://github.com/AmenRa/ranx/blob/master/ranx/report.py#L240)

AmenRa commented 2 years ago

Hi Paul,

Please, give me a more detailed description of the use case. What do you mean by "tweak the runs"? Do you want to switch the compared runs without re-computing the metric scores, or do you need to expand the compared set of runs for printing tables?

As you pointed out, loading a report is not straightforward, and other stuff needs to be implemented first to make it doable.

I will add some new functionalities for saving/loading Runs with pre-computed metric scores (basically, saving/loading the data produced by the evaluate function, which compare also uses). This and the addition of a mechanism to avoid re-computing metric scores during comparisons should make them much smoother.

Could this work in your use case?

PaulLerner commented 2 years ago

I meant "comparing only models A and B instead of A, B and C". Saving metric scores along with runs sounds great! But the statistical tests also take some time, don’t they? My idea was basically to separate the computation from the visualization/printing/formatting of the results.

AmenRa commented 2 years ago

I timed compare with five runs and evaluate with the same runs (metrics=["map@100", "mrr@100", "ndcg@10"]). I got 40+ secs for compare and less than 1 sec in total for evaluate. The Fisher's Randomization Test is very slow, not sure I can make it much faster.

However, I added the Two-sided Paired Student's t-Test few days ago on my local branch. Compare with it takes 2.5 sec on my machine.

I'll commit it in the afternoon so you can use that for faster comparisons.

AmenRa commented 2 years ago

The new version is out. Hope it helps.

PaulLerner commented 2 years ago

Thanks! So, should I close the issue?

AmenRa commented 2 years ago

Well, I don't know honestly :D

Do you think the speed of the comparison with the Two-sided Paired Student's t-Test "solves" your problem? Do you still need a report import function?

PaulLerner commented 2 years ago

Well, the issue was not really about speed (I’m patient enough even to run Fisher's Randomization Test) It was more generally to separate the computation from the visualization/printing/formatting of the results.

But anyway, consider it closed :) (and thanks again for your quick feedback)

AmenRa commented 2 years ago

You are always welcome! :)