Original issue 395 created by ClearTK on 2013-12-08T15:15:59.000Z:
I'm a little surprised by the default behavior of the Bag context when the specified range of its context annotations goes "out-of-bounds" - i.e. past the last annotations. If the specified range goes past the last token in the JCas, then "out-of-bounds" features will be generated. Such features have names whose prefix is {{{OOB}}} followed by a digit corresponding to how far out of the range the feature is. This is pretty confusing default behavior I think. You can imagine that you might generate 50 features per bag. When you get to the end of your token annotations then you will end up with features with the values OOB1, OOB2, ... OOB49. Yikes! To me, it seems that the default behavior would be to filter out OOB features for the Bag context. When those features are desired, then it seems like they should not be indexed.
[Steve]
The Bag context has no concept of in or out of bounds. All it does is
strip off the position information generated other contexts. So if
you're seeing out-of-bounds stuff, it's from the other contexts, not
from Bag.
That said, Bag strips the position by taking the .feature field of a
ContextFeature, and that .feature field is a little bit strange for
out-of-bounds features. If you want to mess around with this, look at
the ContextFeature(String, int, int, String) constructor.
I agree that the interaction of ContextFeature, Bag and other contexts
probably isn't what you would have expected. Where the fix belongs,
I'm not 100% sure.
Original issue 395 created by ClearTK on 2013-12-08T15:15:59.000Z:
I'm a little surprised by the default behavior of the Bag context when the specified range of its context annotations goes "out-of-bounds" - i.e. past the last annotations. If the specified range goes past the last token in the JCas, then "out-of-bounds" features will be generated. Such features have names whose prefix is {{{OOB}}} followed by a digit corresponding to how far out of the range the feature is. This is pretty confusing default behavior I think. You can imagine that you might generate 50 features per bag. When you get to the end of your token annotations then you will end up with features with the values OOB1, OOB2, ... OOB49. Yikes! To me, it seems that the default behavior would be to filter out OOB features for the Bag context. When those features are desired, then it seems like they should not be indexed.
[Steve] The Bag context has no concept of in or out of bounds. All it does is strip off the position information generated other contexts. So if you're seeing out-of-bounds stuff, it's from the other contexts, not from Bag.
That said, Bag strips the position by taking the .feature field of a ContextFeature, and that .feature field is a little bit strange for out-of-bounds features. If you want to mess around with this, look at the ContextFeature(String, int, int, String) constructor.
I agree that the interaction of ContextFeature, Bag and other contexts probably isn't what you would have expected. Where the fix belongs, I'm not 100% sure.