A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
See #104 - for now just implement filtering based on a missing featureName4Value value (null or 0.0): if the attribute is defined to use such a feature and it is not present or 0.0 skip the feature for an attribute and do not generate any ngrams that include it. For ngrams, this would replace the current imputing of 1.0 in such a situation.
Note that with ngrams for n>1, filtering is more complex, since we should somehow drop all ngrams that would include the filtered string, rather than treating the string as non-existing!
This way of filtering would then be the only effective way to avoid generating ngrams of non-consecutive tokens that have been filtered: if we naively just remove the Token annotation, we would create ngrams of the now-adjacent tokens but that should probably be avoided.
See #104 - for now just implement filtering based on a missing featureName4Value value (null or 0.0): if the attribute is defined to use such a feature and it is not present or 0.0 skip the feature for an attribute and do not generate any ngrams that include it. For ngrams, this would replace the current imputing of 1.0 in such a situation.
Note that with ngrams for n>1, filtering is more complex, since we should somehow drop all ngrams that would include the filtered string, rather than treating the string as non-existing!
This way of filtering would then be the only effective way to avoid generating ngrams of non-consecutive tokens that have been filtered: if we naively just remove the Token annotation, we would create ngrams of the now-adjacent tokens but that should probably be avoided.