Open johann-petrak opened 6 years ago
Schemes to consider to implement:
Ratinov and Roth (2009) "Design challenges and misconceptions in named entityrecognition." CoNLL. has an evaluation of different approaches.
However, should also check which scheme is best suited for encoding arbitrarily or constrained overlapping entity annotations! Check out approaches/evaluations on Genia corpus!
Currently the sequence encoding is really done by the feature extractor and the sequence encoder jointly, and the sequence encoder only sees the class annotations for each instance separately.
We should move the full functionality into the sequence encoder and also make sure that all encoding strategies we want to support get all the date they need, which may include the class annotations from the previous instance, the labels generated for the previous instance or even a completely different approach.
To figure this out, start implementing a number of commonly used sequence encoding strategies and think about how to deal with overlapping class annotations for the same or different classes.