google-code-export / uimafit

Automatically exported from code.google.com/p/uimafit
2 stars 1 forks source link

UIMA .subiterator no longer work as expected if we use uimaFIT #132

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
If the components are wired together with uimaFIT, the .subiterator doesn’t 
return the tokens as you would have expected.
See attached TokensTest.java for an example.
//There seems to be a strange behavior if we use uimaFIT to wire up the 
components
//UIMA's .subiterator() no longer returns all of the tokens.
//However if we use uimaFIT's selectCovered(), it works fine.
//Note: if we use UIMA alone with CVD using the xml desc, it works fine.

Original issue reported on code.google.com by peistat...@gmail.com on 14 Aug 2012 at 3:13

Attachments:

GoogleCodeExporter commented 9 years ago
This looks like you're not configuring your type priorities how you want them, 
which is something you have to do if you're going to use subiterators. Here's a 
clearer test that demonstrates your issue. Only 
"testSubiteratorWithNoPriorities" fails for me.

public class Issue132Test {

  @Test
  public void testSelectCoveredWithNoPriorities() throws Exception {
    JCas jCas = getJCas();
    for (Annotation sentence : jCas.getAnnotationIndex(Sentence.type)) {
      List<BaseToken> tokens = JCasUtil.selectCovered(jCas, BaseToken.class, sentence);
      Assert.assertTrue(tokens.iterator().hasNext());
    }
  }

  @Test
  public void testSelectCoveredWithPriorities() throws Exception {
    JCas jCas = getJCas(
        Sentence.class.getName(),
        WordToken.class.getName(),
        NumToken.class.getName());
    for (Annotation sentence : jCas.getAnnotationIndex(Sentence.type)) {
      List<BaseToken> tokens = JCasUtil.selectCovered(jCas, BaseToken.class, sentence);
      Assert.assertTrue(tokens.iterator().hasNext());
    }
  }

  @Test
  public void testSubiteratorWithNoPriorities() throws Exception {
    JCas jCas = getJCas();
    for (Annotation sentence : jCas.getAnnotationIndex(Sentence.type)) {
      FSIterator<?> tokenIterator = jCas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
      Assert.assertTrue(tokenIterator.hasNext());
    }
  }

  @Test
  public void testSubiteratorWithPriorities() throws Exception {
    JCas jCas = getJCas(
        Sentence.class.getName(),
        WordToken.class.getName(),
        NumToken.class.getName());
    for (Annotation sentence : jCas.getAnnotationIndex(Sentence.type)) {
      FSIterator<?> tokenIterator = jCas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
      Assert.assertTrue(tokenIterator.hasNext());
    }
  }

  public JCas getJCas(String... typePriorities) throws Exception {
    TypeSystemDescription typeSystemDescription = TypeSystemDescriptionFactory.createTypeSystemDescription("common_type_system");
    AnalysisEngine engine = AnalysisEngineFactory.createPrimitive(
        NoOpAnnotator.class,
        typeSystemDescription,
        typePriorities,
        new Object[] {});
    JCas jCas = engine.newJCas();
    jCas.setDocumentText("a b c");
    new Sentence(jCas, 0, 1).addToIndexes();
    new Sentence(jCas, 2, 3).addToIndexes();
    new Sentence(jCas, 4, 5).addToIndexes();
    new WordToken(jCas, 0, 1).addToIndexes();
    new NumToken(jCas, 2, 3).addToIndexes();
    new WordToken(jCas, 4, 5).addToIndexes();
    return jCas;
  }
}

Original comment by steven.b...@gmail.com on 14 Aug 2012 at 4:31

GoogleCodeExporter commented 9 years ago
Thanks Steve. Also attached the exported sample eclipse project to make life a 
tad easier :).

If you can confirm this behavior, I think we should modify the unit test to 
perform a count in addition to just an existance as I think it only happens to 
miss *some* tokens.  Just an idea...
For example:
  @Test
  public void testSubiteratorWithNoPriorities() throws Exception {
    JCas jCas = getJCas();
    for (Annotation sentence : jCas.getAnnotationIndex(Sentence.type)) {
      FSIterator<?> tokenIterator = jCas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
      Assert.assertTrue(expectedSize == tokenIterator.size() );
    }
  }

Original comment by peistat...@gmail.com on 14 Aug 2012 at 4:48

Attachments:

GoogleCodeExporter commented 9 years ago
Yeah, it would be fine to use .size() in the tests, but my tests are actually 
showing that this is not an error in uimaFIT, but an error in wherever you're 
creating your AnalysisEngines via uimaFIT - you're probably failing to specify 
your type priorities when you do so. (And perhaps remembering to specify them 
in the XML descriptors?)

Original comment by steven.b...@gmail.com on 14 Aug 2012 at 4:55

GoogleCodeExporter commented 9 years ago
That's what I orginally thought, but specifying the TypePriorties didn't seem 
to help.  I actually think it's how UIMA stores the indexes internally, and the 
issue comes up even before the TypePriorties come into play.  I also added that 
in the example now.

Original comment by peistat...@gmail.com on 14 Aug 2012 at 5:14

GoogleCodeExporter commented 9 years ago
The bug in your code is that you're creating the JCas from a 
TypeSystemDescription, with no TypePriorities. You need to create the JCas from 
the AnalysisEngine that has the TypePriorities. Take a look at the code I 
posted above that doesn't have the problem you're running into - the JCas is 
created by calling "engine.newJCas()" not by calling 
"JCasFactory.createJCas(typeSystemDescription)".

Original comment by steven.b...@gmail.com on 14 Aug 2012 at 6:16

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I think I didn't quite understand your suggestion here. 

Are you asking about propagating TypePriorities from primitives to aggregates? 
If so, yes, that definitely happens. For example, your code works fine for me 
if I only specify the type priorities in the sentenceAE:

    AnalysisEngineDescription sentenceAE = AnalysisEngineFactory.createPrimitiveDescription(
        SentenceAE.class,
        typeSystemDescription,
        typePriorities,
        "param",
        "value");
    AnalysisEngineDescription tokensAE = AnalysisEngineFactory.createPrimitiveDescription(
        TokensAE.class,
        typeSystemDescription,
        "param",
        "value");
    AggregateBuilder builder = new AggregateBuilder();
    builder.add(sentenceAE);
    builder.add(tokensAE);
    AnalysisEngine aggregateAE = builder.createAggregate();
    JCas jcas = aggregateAE.newJCas();
    jcas.setDocumentText("HISTORY:\n6117292345252345\nAddendum:");
    aggregateAE.process(jcas);

It also works fine if I only specify it for the tokensAE, or if I only specify 
it for the aggregate:

    ...
    AggregateBuilder builder = new AggregateBuilder(typeSystemDescription, typePriorities, null);
    ...

Or maybe you were asking if you can use an aggregate to create the JCas? That's 
definitely possible - that's what I'm doing in the code above.

Maybe you can just write down the complete signature of the method you're 
proposing?

Original comment by steven.b...@gmail.com on 14 Aug 2012 at 7:06

GoogleCodeExporter commented 9 years ago
Oh yes! indeed, that did the trick.  Just a suggestion though: I think it would 
be more intutitive to allow the aggregates to preserve or set the TypePriorties 
from the AnalysisEngineFactory.createAggregate():

AnalysisEngine aggregateAE = AnalysisEngineFactory.createAggregate(engines, 
componentNames, typeSystemDescription, typePriorities, sofaMappings);
        aggregateAE.process(jcas);

I think it would be a nice enhancement to JCasFactory to allow type priorities 
which seems to be already supported by UIMA's CasCreationUtils.class.

Original comment by peistat...@gmail.com on 14 Aug 2012 at 7:09

GoogleCodeExporter commented 9 years ago
I still didn't quite get it. Could you give the types of the parameters of the 
createAggregate that you're proposing?

Original comment by steven.b...@gmail.com on 14 Aug 2012 at 7:19

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 7 Jan 2013 at 4:51

GoogleCodeExporter commented 9 years ago
Closing this since the user's original problem appears to have been resolved 
and there has not been any further activity on this issue.

Original comment by richard.eckart on 20 Jul 2013 at 3:14