This PR is focused on improving the sorting of relations. We still order relations according to their first appearance in the text, with some subtle changes:
An entity's offset is now determined by the sum of the start and end characters of its first mention. Previously, we just took the end character offset. This is more informative for overlapping/nested entities. Ditto for entity hints.
Relations are now sorted by first considering the head entities' order. Once sorted by the head entity, they are sorted by the tail entity, and so on for n-ary relations.
The idea is that this ordering might be easier for a model to learn and therefore improve performance. In reality, it is a bit of a mixed bag but these changes mostly improve (or don't harm) performance.
Overview
This PR is focused on improving the sorting of relations. We still order relations according to their first appearance in the text, with some subtle changes:
The idea is that this ordering might be easier for a model to learn and therefore improve performance. In reality, it is a bit of a mixed bag but these changes mostly improve (or don't harm) performance.