There is a lot of info about the quality of a classification guess
in LSI which is hidden from the caller.
Sometimes, the guess is of very high quality and sometimes it's a
toss-up. The caller has no way to distinguish them though.
It would be nice if there was an option to get not only the classification
but a measure of the system's confidence.
I propose adding a method classify_with_confidence(text) which returns
both the classification AND a measure of the confidence as a number
between 0 and 1.0
i.e. guess, confidence = lsi.classify_with_confidence(text)
Then a caller could choose not to use any classification guesses whose
confidence is below some threshold.
There is a lot of info about the quality of a classification guess in LSI which is hidden from the caller.
Sometimes, the guess is of very high quality and sometimes it's a toss-up. The caller has no way to distinguish them though.
It would be nice if there was an option to get not only the classification but a measure of the system's confidence.
I propose adding a method classify_with_confidence(text) which returns both the classification AND a measure of the confidence as a number between 0 and 1.0
i.e. guess, confidence = lsi.classify_with_confidence(text)
Then a caller could choose not to use any classification guesses whose confidence is below some threshold.