Consider model ranking - Githubissues

It would be nice to have explicit model ranking for selection. I.e., something that answers the question of which is the "best" model between a group of trained models without human-eyeballing (of course human eye-balling is also great!). This would be in addition to Pareto-based selection, not a replacement for Pareto-based selection.

Consider Caruana et al. 2004 "b" - https://dl.acm.org/doi/10.1145/1046456.1046470.

I have a prototype here: https://jphall663.github.io/GWU_rml/, code: https://nbviewer.org/github/jphall663/GWU_rml/blob/master/assignments/eval.ipynb.

In addition to prototype, would be really cool for users to be able to:

select the number and type of assessments, e.g., 3 assessments: AUC, max. ACC, and AIR (gets at balancing real-world selection criteria)
for users to choose between random folds and user-selected segments (gets at weakspots and robustness)
for users to be able to perturb folds or data segments (gets at robustness)

(The current prototype is fixed at 5 folds, fixed with five quality assessment stats (no AIR, etc.), and does not perturb folds.)

Let me know if you'd like to discuss.

SelfExplainML / PiML-Toolbox

Consider model ranking #38