Previously, in parse_pubtator, we were retaining only the character offsets of the first appearance of an entity, not all offsets. Although we only use the offsets of the first mention for seq2rel, its important to retain all offsets for other purposes (e.g. computing corpus statistics). I fixed this, and updated the tests to check that all offsets are retained.
Previously, in
parse_pubtator
, we were retaining only the character offsets of the first appearance of an entity, not all offsets. Although we only use the offsets of the first mention for seq2rel, its important to retain all offsets for other purposes (e.g. computing corpus statistics). I fixed this, and updated the tests to check that all offsets are retained.