Closed GoogleCodeExporter closed 9 years ago
This issue was closed by revision r873.
Original comment by torsten....@gmail.com
on 9 Jun 2014 at 2:16
r873 was helpful, but since it doesn't technically solve this Issue's problem,
I am reopening the issue until a solution is committed.
This original issue concerned no usable ngrams (not empty text), which should
probably be solved in all affected MetaCollectors by first collecting the
ngrams (in this case, charNGrams):
public void process(JCas jcas)
throws AnalysisEngineProcessException
{
FrequencyDistribution<String> charNGrams = NGramUtils.getCharacterSkipNgrams(
jcas, ngramLowerCase, minN, maxN, skipSize);
and then, if the ngrams fd isn't empty, creating the lucene document:
initializeDocument(jcas);
Original comment by EmilyKJa...@gmail.com
on 9 Jun 2014 at 3:33
Actually, it looks like process() should be refactored away; no reason to
implement this fix 8 times, when the only thing that changes is the contents of
the fd and the Field name.
Original comment by EmilyKJa...@gmail.com
on 9 Jun 2014 at 3:46
First, process() cannot be "refactored away" as this is exactly where the
functionality of the different lucene-based meta collectors are implemented.
Second, could you please describe a bit more clearly why the fix doesn't solve
the problem?
Original comment by torsten....@gmail.com
on 9 Jun 2014 at 4:05
1. Since the only things that change between the (single doc) metacollectors
are the contents of a single fd and also the Field name, it looks reasonable to
put process() into LuceneBasedMetaCollector and have this process() call 2
abstract methods that fetch the contents of the fd and the Field name.
Benefit: Instantiating the lucene doc would be hidden from the user, less
exposed for copy-paste errors, and there would be less code duplication.
Drawback: If someone needed a more customized lucene ngram metacollector, they
would have to override the (proposed) LuceneBasedMetaCollector.process() and
implement their own version. Also, there would need to be a
LuceneBasedPairmetaCollector for an equivalent for the pairs metacollectors
2. On second look, you're right, the solution is fine. I mis-interpreted
"document" in the commit notes.
Original comment by EmilyKJa...@gmail.com
on 9 Jun 2014 at 4:31
Regarding 1)
If you think that this makes things clearer, please go ahead with the
refactoring.
Original comment by torsten....@gmail.com
on 9 Jun 2014 at 4:47
Original comment by daxenber...@gmail.com
on 13 Jun 2014 at 3:18
Original issue reported on code.google.com by
EmilyKJa...@gmail.com
on 9 Jun 2014 at 1:45Attachments: