GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

Think about how to add per-instance cost to classification. #34

Closed johann-petrak closed 8 years ago

johann-petrak commented 8 years ago

The easiest way to do this is probably to have a list of costs instead of the actual target, e.g. "[2.0, 1.3, 0.0, 1.4]" where the correct class would be the one with index 2 (cost 0.0). In this case, the prediction would be class index instead of the class itself. Alternately, the target could be a map of class->cost entries in which case we could predict the class itself. In any case we would have to deal with this in a way where we preserve the cost information for algorithms which can use it and ignore it (except for determining the correct target class) for all other algorithms.

johann-petrak commented 8 years ago

This is now implemented. So far, only python costcla supports costs but 1) only for binary classification and 2) it needs the full matrix of TP,FP,TN,FN. We have added an experimental wrapper of costcla to sklearn-wrapper but it turns out this does not work properly yet because pickling does not work. Putting this on hold for now until we have found a usable alternative or costcla has been made to work for us.