google-code-export / uimafit

Automatically exported from code.google.com/p/uimafit
2 stars 1 forks source link

Add (J)CasUtil.selectBetween #86

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I commonly have the need to select annotations between to focus annotations. We 
should add (J)CasUtil.selectBetween for this purpose. Here's some code that I'm 
currently using for this:

  private static <T extends Annotation> List<T> selectBetween(
      JCas jCas,
      Class<T> annotationClass,
      Annotation ann1,
      Annotation ann2) {
    if (ann1.getEnd() > ann2.getBegin()) {
      String message = "Expected first annotation before second, found:\n%s\n%s";
      throw new RuntimeException(String.format(message, ann1, ann2));
    }
    Type type = JCasUtil.getType(jCas, annotationClass);
    FSIterator<Annotation> iter = jCas.getAnnotationIndex(type).iterator();
    iter.moveTo(ann1);
    while (iter.isValid() && iter.get().getBegin() < ann1.getEnd()) {
      iter.moveToNext();
    }
    List<T> anns = new ArrayList<T>();
    while (iter.isValid() && iter.get().getEnd() <= ann2.getBegin()) {
      anns.add(annotationClass.cast(iter.get()));
      iter.moveToNext();
    }
    return anns;
  }

I'll try to find some time to add this and write some tests for it, but I won't 
complain if someone else beats me to it. ;-)

Original issue reported on code.google.com by steven.b...@gmail.com on 18 Apr 2011 at 3:27

GoogleCodeExporter commented 9 years ago
Do I understand correctly that this is basically a

selectCovered(jcas, annotationClass, ann1.getEnd(), ann2.getBegin())

?

Original comment by richard.eckart on 18 Apr 2011 at 5:16

GoogleCodeExporter commented 9 years ago
Yep, except that your concerns about the efficiency of using the integer 
endpoints shouldn't apply here since you can do a moveTo on the first 
annotation.

Original comment by steven.b...@gmail.com on 18 Apr 2011 at 6:43

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 7 May 2011 at 5:28

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 2 Jan 2012 at 9:47

GoogleCodeExporter commented 9 years ago
I think this code is not working as desired:

Given a CAS with Token [38..51], Token [65..76] and Sentence [55..56] the 
result using the Tokens as ann1 and ann2 should be the Sentence, but it is 
empty instead.

Original comment by richard.eckart on 2 Jan 2012 at 9:57

GoogleCodeExporter commented 9 years ago
Since I think your reference code is not working as desired, I have added an 
alternative implementation based on selectCovered. There are two new 
selectBetween() methods in each JCasUtil and CasUtil. Here the JavaDoc:

Get a list of annotations of the given annotation type located between two 
annotations.
Does not use subiterators and does not respect type priorities. Zero-width 
annotations
what lie on the borders are included in the result, e.g. if the boundary 
annotations are
[1..2] and [2..3] then an annotation [2..2] is returned. If there is a non-zero 
overlap
between the boundary annotations, the result is empty. The method properly 
handles cases
where the second boundary annotations occurs before the first boundary 
annotation by
switching their roles.

The average speedup over selectCovered(jcas, type, left, right) seems to be 
around 1.6 according to a little randomized unit test that I set up.

Can you please have a look if the new method(s) works for you?
---
Committed revision 659.

Original comment by richard.eckart on 2 Jan 2012 at 10:05

GoogleCodeExporter commented 9 years ago
Yeah, I agree that your implementation does what I would want it to. Thanks for 
both adding this and fixing bugs in my code. ;-)

Original comment by steven.b...@gmail.com on 3 Jan 2012 at 1:21

GoogleCodeExporter commented 9 years ago
Removed unnecessary test code.
---
Committed revision 663.

Original comment by richard.eckart on 4 Jan 2012 at 2:51

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 4 Jan 2012 at 10:52