ashokpant / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
0 stars 0 forks source link

Issue with the LucenePOSNGramFeatureExtractorBase class #137

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi everyone,

I'm trying to extract the most common POS Ngrams with 
LucenePOSNGramFeatureExtractorBase and LucenePOSNGramDFE the same way I used 
LuceneNgramFeatureExtractorBase to get common token ngrams. But it doesn't 
really find all the POS Ngrams, I guess because there's no equivalent of the  
getTopNgrams in the LucenePOSNGramFeatureExtractorBase class.
It might be also that I misunderstood the purpose of the POS Ngrams classes. In 
this case, can somebody give me some explanation ?

Anil

Original issue reported on code.google.com by narassig...@gmail.com on 5 Jun 2014 at 10:17

GoogleCodeExporter commented 9 years ago
The *base classes shouldn't be used to specify the feature set, but only the 
ones that end in *DFE. 
Apart from that, there might really be an issue with this FeatureExtractor. 
There is no test for it, so it might have gone unnoticed.

Original comment by daxenber...@gmail.com on 5 Jun 2014 at 10:48

GoogleCodeExporter commented 9 years ago
LucenePOSNGramDFE uses the variable topKSet (from NGramFeatureExtractorBase) 
which is computed by getTopNgrams()(abstract in NGramFeatureExtractorBase). 
Then getTopNgrams is overridden in LuceneNgramFeatureExtractorBase but not in 
LuceneNgramFeatureExtractorBase, I guess the issue comes from this.

Original comment by narassig...@gmail.com on 5 Jun 2014 at 11:45

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r869.

The class hierarchy of the LuceneNgram Feature Extractors was inconsistent. I 
changed it to be consistent with LucenePOSNGram FeatureExtractors etc. 
LucenePOSNGramFeatureExtractor has been working fine as shown in the 
corresponding test, but you need to set different parameters to configure it 
(as also shown in the test). I guess, this was causing the unexpected results 
here.

Original comment by daxenber...@gmail.com on 5 Jun 2014 at 2:39

GoogleCodeExporter commented 9 years ago
The problem wasn't LucenePOSNGramFeatureExtractorBase, but rather a wrong usage 
of parameters, I suppose. Please re-open if the problems persists.

Original comment by daxenber...@gmail.com on 5 Jun 2014 at 2:41