google-code-export / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
1 stars 0 forks source link

Some lucene ngram Meta Collectors don't record term counts in a document #142

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This issue was created by revision r878.

Previously, some Meta Collectors only recorded an ngram once per document.  
Unsystematically, other Meta Collectors recorded the ngram's count per 
document.  This makes a difference because most users will constrain their list 
of ngram features based on ngram frequency, and the choice of method changes 
which ngrams are most frequent.

Now, Meta Collectors are standardized to record the count of an ngram in a text.

Note: This bugfix may change your experiment results.

Original issue reported on code.google.com by EmilyKJa...@gmail.com on 9 Jun 2014 at 10:39

GoogleCodeExporter commented 9 years ago

Original comment by daxenber...@gmail.com on 13 Jun 2014 at 3:18