Confidence when using text classification

I am developing an intent classification model using the text classification pipeline. The model simply predicts intent based on a question provided (that comes from the user). Things work relatively well when the phrases are close to the training set but I start getting some weird random results for totally unrelated terms. Looking at scores, these appear to be normalized so they add up to 1. That means that I can get a relatively high score (often around +-0.8) for a phrase that have nothing to do with any question from the training set. I am wondering if: a) Is there a way to get unnormalized scores, e.g. the sum of the scores is <0,1> where the closer it is to 1, the higher confidence there is. b) General confidence value that would simply tell me if I should trust the output of the prediction?

I was considering creating 2 models, one with a bunch of random text that is labeled as 0 and the actual training phrases labeled as 1. The other model with just training phrases. The first model would then give me confidence, while the other one would give me the actual result. This however seems like a huge overkill, especially considering that I would probably have to include thousands of text samples carefully picked not to resemble actual text phrases ... Seems awful but nothing else comes to mind. Anybody approached this in a more reasonable way?

dotnet / machinelearning

Confidence when using text classification #5743