fangfangli / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

should SimpleFeatureExtractors have a getName()? #244

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
So in working on ContextExtractor, one of the inconvenient things I noticed was 
that "out of bounds" features can't have the same names as their "in bounds" 
counterparts. From ContextExtractorTest:

    this.assertFeature("Bag_Preceding_3_6", "OOB1", iter.next());
    this.assertFeature("Bag_Preceding_3_6_TypePath(Pos)", "DT", iter.next());

The reason for this is that the only way you can currently get the name that a 
feature extractor uses for its features is by inspecting the Feature.getName() 
of each individual feature. This would not be a problem if there was a 
SimpleFeatureExtractor.getName() method that I could call instead.

I think the vast majority of our feature extractors are like TypePathExtractor 
in that all features created by the extractor have the same name, and that name 
is static enough that it could be determined at the time the TypePathExtractor 
was constructed.

Maybe we should introduce a subinterface of SimpleFeatureExtractor, say, 
NamedFeatureExtractor, that has a getName method, and retrofit this to all the 
feature extractors where it makes sense?

Original issue reported on code.google.com by steven.b...@gmail.com on 15 Apr 2011 at 3:59

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 24 Jul 2012 at 5:54

GoogleCodeExporter commented 9 years ago

Original comment by lee.becker on 17 Feb 2013 at 5:13

GoogleCodeExporter commented 9 years ago
Fixed in 7e582e0ae69d00b3e4c2fca824b6101b46ccca87.

Note that this is a backwards-incompatible change: Features from Count, Bag, 
Ngram and Ngrams contexts will now have additional information in their feature 
names if the nested extractor is a SimpleNamedFeatureExtractor. Models using 
such features will need to be rebuilt.

The one exception is if the nested extractor is a SimpleNamedFeatureExtractor 
that returns null for its getFeatureName, like CoveredTextExtractor. In this 
case, the feature names should not change. It's probably still safer to just 
rebuild the models though.

Original comment by steven.b...@gmail.com on 18 Feb 2013 at 2:34