GateNLP / gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
https://gatenlp.github.io/gateplugin-LearningFramework/
GNU Lesser General Public License v2.1
26 stars 6 forks source link

Rethink API for sequence encoder and implement a few more #65

Open johann-petrak opened 6 years ago

johann-petrak commented 6 years ago

Currently the sequence encoding is really done by the feature extractor and the sequence encoder jointly, and the sequence encoder only sees the class annotations for each instance separately.

We should move the full functionality into the sequence encoder and also make sure that all encoding strategies we want to support get all the date they need, which may include the class annotations from the previous instance, the labels generated for the previous instance or even a completely different approach.

To figure this out, start implementing a number of commonly used sequence encoding strategies and think about how to deal with overlapping class annotations for the same or different classes.

johann-petrak commented 6 years ago

Schemes to consider to implement:

Ratinov and Roth (2009) "Design challenges and misconceptions in named entityrecognition." CoNLL. has an evaluation of different approaches.

However, should also check which scheme is best suited for encoding arbitrarily or constrained overlapping entity annotations! Check out approaches/evaluations on Genia corpus!