EricArcher / banter

banter is a package for creating hierarchical acoustic event classifiers out of multiple call type detectors.
9 stars 0 forks source link

ConfusionMatrix Threshold results don't match text in Banter Guide #7

Closed sfregosi closed 5 months ago

sfregosi commented 5 months ago

Hi @EricArcher,

I was working through the Banter Guide (accessed via the package that directed me here: https://taikisan21.github.io/PAMpal/banterGuide.html#242_Random_Forest_Summaries) and noticed a possible issue (or an error in interpretation on my part!)

In section 2.4.2 > Confusion Matrix, where it has the example with the 0.8 threshold, the numbers in the last column (Pr.gt_0.8) don't match what is written in the text below that.

In running my own model through this same snippet of code I also get extremely small numbers for one species (1.2e-119), then near 1 (9.9e-1) and exactly 1 for the Overall value.

# Confusion Matrix with medium threshold
confusionMatrix(bant.rf, threshold = 0.8)
        X33 X577 pct.correct LCI_0.95  UCI_0.95     Pr.gt_0.8
X33      20    0   100.00000 83.15665 100.00000 1.220165e-119
X577      4  204    98.07692 95.14962  99.47360  9.999580e-01
Overall  NA   NA    98.24561 95.56924  99.51997  1.000000e+00

I was going to interpret this as a very low probability that a X33 event will be predicted as X33 with a score >0.8, but a very high probability that an X577 event will be predicted as X577 with a score >0.8. Is that correct?

Thank you! Selene

EricArcher commented 5 months ago

Yes, that is a correct interpretation of the last column that summarizes the distribution of assignment scores. This case is one that illustrates why I include that metric in the output as you also have perfect classification for that class. To confirm this (and understand more about the assignment probability distribution) use the plotVotes() function.

sfregosi commented 5 months ago

Ok, great. Glad I was on the right track. Thank you!