Retain all offsets - Githubissues

JohnGiorgi / seq2rel-ds

This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.

5 stars 1 forks source link

Retain all offsets #21

Closed JohnGiorgi closed 3 years ago

JohnGiorgi commented 3 years ago

Previously, in parse_pubtator, we were retaining only the character offsets of the first appearance of an entity, not all offsets. Although we only use the offsets of the first mention for seq2rel, its important to retain all offsets for other purposes (e.g. computing corpus statistics). I fixed this, and updated the tests to check that all offsets are retained.