A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
This is mainly for nominal attributes that represent ngrams: it should be possible to look up something like idf for each value, with some default for a nominal value that is not in the list.
Then the final value used is origValue * lookedUpValue. Where origValue is either 1.0 or the count, if we have tf counting enabled (see #7)
This way we can do any idf or other scoring calculation on the training corpus beforehand and just use this at training time.
This may best be done by representing the lookup values as a LR. This would allow us to generalize to different kinds of LR which could also be used to represent e.g. pre-initialized scaling functions for all attributes (so one way of doing attribute scaling could use the same approach, relying on some representation of the scaling function created in a separate step using a separate program or PR).
This is mainly for nominal attributes that represent ngrams: it should be possible to look up something like idf for each value, with some default for a nominal value that is not in the list. Then the final value used is origValue * lookedUpValue. Where origValue is either 1.0 or the count, if we have tf counting enabled (see #7) This way we can do any idf or other scoring calculation on the training corpus beforehand and just use this at training time. This may best be done by representing the lookup values as a LR. This would allow us to generalize to different kinds of LR which could also be used to represent e.g. pre-initialized scaling functions for all attributes (so one way of doing attribute scaling could use the same approach, relying on some representation of the scaling function created in a separate step using a separate program or PR).