Closed AlineTalhouk closed 7 years ago
Yes that is how class predictions are made. What do you mean by setting a threshold?
I mean saying that if probability not greater than 0.5 for any class assign "unclassified" label
what happens to the predicted class then
sorry I meant, why are we doing this
What this is supposed to do is to only classify the cases for which you are sure about the classification. You can change the threshold to match the amount of sensitivity/specificity needed for the clinical application
How does this impact evaluation metrics then? Do we only compare the classified predictions with the corresponding true class labels?
yes. Isn't that what we currently do?
I mean the probabilities don't change only the class predictions
Well currently, all predictions are classified. For example, looking at one run of xgboost
, if we set a threshold of 0.5 the number of classified predictions decreases to 40% of the original test sample.
Yes probabilities don't change, but class predictions do and they form the confusion matrices from which evaluation metrics are calculated.
Say, for example, instead of having c(4, 4, 3, 1, 2, 3)
after thresholding I might have c(4, NA, NA, 1, 2, 3)
, in which case I would remove the corresponding indices (2nd and 3rd) in the true class label before constructing the confusion matrix.
yes but you just ignore those cases (you filter them out) so in your example you exclude the 2nd and third element that correspond to NA
yeah that's what I was asking here
Do we only compare the classified predictions with the corresponding true class labels?
I just want to touch base on all propagating side-effects before modifying the code so we're on the same page.
yes. You would have to create a label to filter on "unclassifiable" not NA to not be confused with an error. We should be able to dial this down to default, that will classify all cases and threshold which will make more strict rules on classification. This can be optimized by ROC curves say
Yes I was planning to use "unclassifiable"
or some other string and not NA, that was just an example. I will keep the complete predictions, and add an attr
maybe called class_threshold
that shows the same predictions but with some unclassified cases, and also an attr
for the proportion of classified cases maybe called proportion
Implementation changes:
threshold
value (default = 0.5) for lowest max class probability added as attr
named class.thres
purrr::`%@%`
attr
named class.prop
n' < n
classes where n
is number of true classes (e.g. qda
), add back missing levels to the class factor to ensure confusion matrix is square
forcats
might need to be installed firstxgboost
), use the original predicted class labels for evaluation
@dchiu911 do you know how to go from probability to class prediction? is it the highest probability? is there a way to set a threshold?