if you call AnnotationSetImpl.remove(annotation) after the nodesByOffset map has been built by AnnotationSetImpl.indexByStartOffset() then while the annotation is removed from the set, nodesByOffset isn't recomputed.
An example of where this is a problem is as follows. Let's assume you are processing tweets and have found a @mention. You then decide that you want to simplify the Token annotations so that instead of potentially many (@mentions can contain numbers and underscores which would be separate tokens) you want there to be just two; one over the @ and one over the rest. So you do something like
get all the Token annotations within the matchAnnots (which I'm assuming is the UserMention annotation)
get the Token annotation that starts at the offset of the beginning of matchAnnots, i.e. the Token spanning the @
removes the Token over the @ from the annotation set (but not the document)
This does work as expected in that between the first and last line of code the tokens annotation set shrinks by one annotation. The problem is that if you then do
You'll find that aligned is true, as the nodesByOffset map that powers firstNode hasn't been updated when the annotation was removed and it still points to the node prior to the @ and not the node after the @ and at the beginning of the earliest annotation within the tokens annotation set.
Depending what your code does next, this may or may not be a problem, but if you make any use of firstNode then you'll get the wrong result. Similarly removing the last annotation from a set would have a similar affect on the result of lastNode.
if you call
AnnotationSetImpl.remove(annotation)
after thenodesByOffset
map has been built byAnnotationSetImpl.indexByStartOffset()
then while the annotation is removed from the set,nodesByOffset
isn't recomputed.An example of where this is a problem is as follows. Let's assume you are processing tweets and have found a @mention. You then decide that you want to simplify the Token annotations so that instead of potentially many (@mentions can contain numbers and underscores which would be separate tokens) you want there to be just two; one over the @ and one over the rest. So you do something like
which should
matchAnnots
(which I'm assuming is the UserMention annotation)matchAnnots
, i.e. the Token spanning the @This does work as expected in that between the first and last line of code the
tokens
annotation set shrinks by one annotation. The problem is that if you then doYou'll find that
aligned
is true, as thenodesByOffset
map that powersfirstNode
hasn't been updated when the annotation was removed and it still points to the node prior to the @ and not the node after the @ and at the beginning of the earliest annotation within thetokens
annotation set.Depending what your code does next, this may or may not be a problem, but if you make any use of
firstNode
then you'll get the wrong result. Similarly removing the last annotation from a set would have a similar affect on the result oflastNode
.