Closed GoogleCodeExporter closed 9 years ago
I agree this looks like a bug, but I could not figure out how to write a test
that provokes the error you're seeing. Could you propose an additional test for
TreebankGoldReaderAndAnnotatorTest?
Original comment by steven.b...@gmail.com
on 11 Mar 2011 at 8:18
This may be less of a bug and more of a gap in my understanding of how uimaFIT
works with views. I'll let you be the judge. The following modified version of
your test case reproduces the error I observed. (Hopefully this formats nicely)
{{{
@Test
public void testWhenDefaultViewDocumentTextIsSet() throws Exception {
String treebankParse = "( (X (NP (NP (NML (NN Complex ) (NN trait )) (NN analysis )) (PP (IN of ) (NP (DT the ) (NN mouse ) (NN striatum )))) (: : ) (S (NP-SBJ (JJ independent ) (NNS QTLs )) (VP (VBP modulate ) (NP (NP (NN volume )) (CC and ) (NP (NN neuron ) (NN number)))))) )";
// String expectedText = "Complex trait analysis of the mouse striatum:
independent QTLs modulate volume and neuron number";
String expectedText = "Complex trait analysis of the mouse striatum : independent QTLs modulate volume and neuron number";
/* set the document text for the default view as it might be set by a collection reader, e.g. {@link FilesCollectionReader} */
JCas view = ViewCreatorAnnotator.createViewSafely(jCas, CAS.NAME_DEFAULT_SOFA);
view.setSofaDataString(expectedText, "text/plain");
AnalysisEngine engine = AnalysisEngineFactory.createPrimitive(TreebankGoldAnnotator.class,
typeSystemDescription);
TreebankGoldAnnotator treebankGoldAnnotator = new TreebankGoldAnnotator();
treebankGoldAnnotator.initialize(engine.getUimaContext());
JCas tbView = jCas.createView(TreebankConstants.TREEBANK_VIEW);
tbView.setDocumentText(treebankParse);
// treebankGoldAnnotator.process(jCas);
engine.process(jCas);
JCas goldView = jCas.getView(CAS.NAME_DEFAULT_SOFA);
FSIndex<Annotation> sentenceIndex = goldView.getAnnotationIndex(Sentence.type);
assertEquals(1, sentenceIndex.size());
Sentence firstSentence = JCasUtil.selectByIndex(goldView, Sentence.class, 0);
assertEquals(expectedText, firstSentence.getCoveredText());
}
}}}
Note the changes to expectedText (I've simply added extra spaces to make it
different from what TreebankFormatParser.inferPlainText() produces) and the
setting of the document text for the default view. The real difference,
however, is the commenting out of
treebankGoldAnnotator.process(jCas);
and the addition of
engine.process(jCas);
Using treebankGoldAnnotator.process(jCas), the test passes and all is fine.
Using engine.process(jCas), which is what I was using when I ran into this
issue, results in an exception (org.apache.uima.cas.CASRuntimeException: Data
for Sofa feature setLocalSofaData() has already been set.)
The suggested fix I mentioned in my initial posting resolves this issue when
using engine.process(jCas). I'm now wondering if this is not necessarily a bug,
but a lack of understanding on my part.
Can you perhaps shed some light as to why the separate initialization of a
TreebankGoldAnnotator (lines 63 and 64 in TreebankGoldReaderAndAnnotatorTest)
is necessary and what those lines do that
AnalysisEngineFactory.createPrimitive(TreebankGoldAnnotator.class,
typeSystemDescription) does not?
Thanks,
Bill
Original comment by bill.bau...@gmail.com
on 11 Mar 2011 at 6:37
Bill, Yep - this is a bug. Thanks for pointing it out and providing the test.
This is somewhat confusing because the default CAS is used for the docView.
So, you might expect that calling jCas.getDocumentText() would work anyways.
See the javadoc for the SofaCapability annotation definition for an
explanation.
I have fixed this in r2794
Original comment by phi...@ogren.info
on 13 Mar 2011 at 3:53
Original issue reported on code.google.com by
bill.bau...@gmail.com
on 11 Mar 2011 at 12:03