A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
Keep the option to have many features but make it easy to have just the simple one-feature approach.
Store dense corpus instances as maps in each line with standard keys for label and possibly features
unify with representation for unlabeled data (e.g. embedding creation or topic models) and other kinds of supervised/unsupervised tasks, e.g. seq2seq or semantic similarity
!!!! change representation of sequences: instead of having a sequence of element with multiple features, have a sequence for each feature. Makes it MUCH easier to create batches later.
Make it easy to swith between our output and the torchnlp library in the python backend
This should become a project possibly with several subissues.
This should become a project possibly with several subissues.