Open hannahilea opened 2 years ago
I think one nice benefit here is that it'll be easier to use inspect results with several thresholds. For example, I could make a report with first threshold-independent metrics like ROC curves, PR curves, etc. Then I could show metrics with several choices of threshold, e.g. "sensitivity >= s1" for a few choices of s1, and "minimize calibration error", or "max sensitivity * (1-fpr)" (i.e. from the ROC curve).
Right now,
evaluation_metrics_row
is kludgy and tries to do way too much stuff, which makes it (a) hard to know exactly what your outputs relate to and (b) hard to customize any internal subfunctions without rewriting the entire loop (or threading some new param everywhere). It also combines a bunch of different types of output metrics into a single hard-to-parse schema (EvaluationRow
). We'd like to refactor this!Plan (as devised w/ @ericphanson): Split current
evaluation_metrics_row
function into three separate (types of) functions:What is the output if you call all three steps??
EvaluationRow
, which contains threshold-dependent AND threshold-independent metrics, and where each field contains results for all classes AND (when valid) a multiclass value. (i.e., some fields are per class, some fields are multiclass, some contain both; some fields threshold dependent, some are threshold independentOther important points: