Closed MartinBernstorff closed 1 year ago
Tests | Skipped | Failures | Errors | Time |
---|---|---|---|---|
47 | 1 :zzz: | 0 :x: | 0 :fire: | 52.029s :stopwatch: |
Do our models automatically adjust the threshold? Where is this logged? But yes, I feel like it is a decent solution to just support one metric per function for now and keep metric-specific functions even though this means having shared functionality in multiple functions.
The threshold is somethign we have to decide on manually - it's based on clinical utility on a case-by-case basis, since the cost of false positives and false negatives are dramatically different between projects π
The threshold is somethign we have to decide on manually - it's based on clinical utility on a case-by-case basis, since the cost of false positives and false negatives are dramatically different between projects π
Yes of course, makes sense. But where do we currently specify this? Or do we just try out thresholds post-training? Because all e.g. sensitivity functions/plots need to be given a threshold and currently they're just using a default 0.5 threshold. :)
The threshold is somethign we have to decide on manually - it's based on clinical utility on a case-by-case basis, since the cost of false positives and false negatives are dramatically different between projects π
Yes of course, makes sense. But where do we currently specify this? Or do we just try out thresholds post-training? Because all e.g. sensitivity functions/plots need to be given a threshold and currently they're just using a default 0.5 threshold. :)
Ah, I see! We only specify it when calling the functions. Currently leaning towards doing all model eval post-training except ROC-AUC (#452).
For most plots, we want to use the same threshold, so there's definitely an argument for making it a config option somewhere when we do post-training model eval.
Agree with Jakob's point with the threshold. Maybe it could make sense to enable multiple metrics for the plot function, but limit it to e.g., the three most used? Otherwise, it's fine with us to create plots per metric for now and then create additional plots when we need them π
Excellent, thanks! I'll pare it down to the simplest possible for now, and then we can add complexity when we need it π
!! Disregard the commits for now. This is more of a discussion of where to go from here. !!
We have a fundamental problem here - a bunch of our functions support any arbitrary metric function, but that means that the input arguments interact with one another.
E.g.
roc_auc_score
needsy_hat_probs
, whereas sensitivity needsy_hat_probs
for a given threshold (y_hat_int
).This means that a function like
plot_performance_by_sex
is super hard to write. Essentially we'd need a function that maps eachmetric_fn
to its inputs. I can see that I got started on that a while ago,metric_fn_to_input
, but that hasn't gotten used in the repo.That makes the interface a pain to work with! I don't think this complexity is worth it to support any arbitrary metric function. Instead, I suggest we only support one metric per function for now, to make life simpler.
@bokajgd, @sarakolding, @HLasse, @KennethEnevoldsen - any opinions on this?
Follows on #447.