ClearTK / cleartk

Machine learning components for Apache UIMA
http://cleartk.github.io/cleartk/
Other
129 stars 58 forks source link

default behavior of Bag context for out-of-bounds annotations #393

Open bethard opened 9 years ago

bethard commented 9 years ago

Original issue 395 created by ClearTK on 2013-12-08T15:15:59.000Z:

I'm a little surprised by the default behavior of the Bag context when the specified range of its context annotations goes "out-of-bounds" - i.e. past the last annotations. If the specified range goes past the last token in the JCas, then "out-of-bounds" features will be generated. Such features have names whose prefix is {{{OOB}}} followed by a digit corresponding to how far out of the range the feature is. This is pretty confusing default behavior I think. You can imagine that you might generate 50 features per bag. When you get to the end of your token annotations then you will end up with features with the values OOB1, OOB2, ... OOB49. Yikes! To me, it seems that the default behavior would be to filter out OOB features for the Bag context. When those features are desired, then it seems like they should not be indexed.

[Steve] The Bag context has no concept of in or out of bounds. All it does is strip off the position information generated other contexts. So if you're seeing out-of-bounds stuff, it's from the other contexts, not from Bag.

That said, Bag strips the position by taking the .feature field of a ContextFeature, and that .feature field is a little bit strange for out-of-bounds features. If you want to mess around with this, look at the ContextFeature(String, int, int, String) constructor.

I agree that the interaction of ContextFeature, Bag and other contexts probably isn't what you would have expected. Where the fix belongs, I'm not 100% sure.

bethard commented 9 years ago

Comment #1 originally posted by ClearTK on 2013-12-08T15:18:14.000Z:

I was wondering if you get the same features when you use the extractWithin method. You do.

bethard commented 9 years ago

Comment #2 originally posted by ClearTK on 2014-03-15T17:41:52.000Z:

<empty>