GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

Add a way to lookup the numeric representation of a nominal attribute. #10

Closed johann-petrak closed 8 years ago

johann-petrak commented 8 years ago

This is mainly for nominal attributes that represent ngrams: it should be possible to look up something like idf for each value, with some default for a nominal value that is not in the list. Then the final value used is origValue * lookedUpValue. Where origValue is either 1.0 or the count, if we have tf counting enabled (see #7) This way we can do any idf or other scoring calculation on the training corpus beforehand and just use this at training time. This may best be done by representing the lookup values as a LR. This would allow us to generalize to different kinds of LR which could also be used to represent e.g. pre-initialized scaling functions for all attributes (so one way of doing attribute scaling could use the same approach, relying on some representation of the scaling function created in a separate step using a separate program or PR).

johann-petrak commented 8 years ago

Closing for now since we have something similar by means of the FEATURENAME4VALUE setting.