GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

Properly implement classification confidence scores #68

Open johann-petrak opened 6 years ago

johann-petrak commented 6 years ago

This is a bit messy at the moment: make sure we always assign the correct confidence scores to a classification (and if possible, all class labels) if the algorithm returns them, and that we have a consistent way to do things if an algorithm does not return them (or does not return the full list for all classes). This should also be done right for algorithms using the dense corpus representation where the LF does not know any indices for class labels, and therefore we cannot use an array of class confidence scores.

Currently, there is also some discrepancy between classification and chunking as to how null or Double.NaN is handled as the value of a classification. Make sure we do this right in the chunking code (ModelApplication.addSurroundingAnnotation, but this should really get moved into the SeqEncoderDecoder classes)

johann-petrak commented 6 years ago

Also for both classification and chunking, the confidenceThreshold parameter should be optional with a default of null/not specified, in which case no checking of the confidence is performed at all.

johann-petrak commented 6 years ago

Allowing to leave confidenceThreshold parameter empty for not checking is implemented for classification application now.

johann-petrak commented 6 years ago

Allowing to leave confidenceThreshold parameter empty for not checking is implemented for chunking application as well now.