however implementations such as TokenConstraintAnnotator perform additional filtering, they only yield the longest span. This means in cases where the longest span violates _is_allowed_span but there exists a shorter span that is valid (but overlaps) it is not considered.
I think the logic should really be to return the longest valid spans, which means the _is_allowed_span needs to be called in the find_spans method and not __call__ of the base class.
A workaround seems to be to add the name of the annotator itself to the incompatible_sources, and then yield the candidate spans in order of length descending. That way it will return spans that satisfy both constraints.
The base annotator filters each annotation based on
_is_allowed_span
https://github.com/NorskRegnesentral/skweak/blob/fba1037399121d5468187aac746f52cb57bc8d31/skweak/base.py#L88however implementations such as
TokenConstraintAnnotator
perform additional filtering, they only yield the longest span. This means in cases where the longest span violates_is_allowed_span
but there exists a shorter span that is valid (but overlaps) it is not considered.I think the logic should really be to return the longest valid spans, which means the
_is_allowed_span
needs to be called in thefind_spans
method and not__call__
of the base class.A workaround seems to be to add the
name
of the annotator itself to the incompatible_sources, and then yield the candidate spans in order of length descending. That way it will return spans that satisfy both constraints.