google-code-export / uimafit

Automatically exported from code.google.com/p/uimafit
2 stars 1 forks source link

Debug utility to access all annotations by type #123

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It may be useful to debug analysis engines by reviewing the entire set of 
annotations in a JCas. UIMA (and UimaFit) only seem to provide iterators by 
type. This may be resolved by providing a JCasUtil.iterator() method which 
iterates over all FSes, or by providing a method to iterate over iterators over 
types.

For example, using UIMA in Python via JPype, I can get a count of each 
annotation type with:

 for typ in self.jcas.getTypeSystem().getTypeIterator():
    try:
        count = sum(1 for a in self.jcas.getAnnotationIndex(typ).iterator())
        if count:
            yield typ, count
    except Exception:
        pass

Original issue reported on code.google.com by joel.nothman@gmail.com on 24 May 2012 at 4:15

GoogleCodeExporter commented 9 years ago
Types in UIMA all inherit from a common type TOP. If you want to iterater over 
everything in the CAS, try:

for(TOP t : JCasUtil.select(jcas, TOP.class)) {
   ...
}

Original comment by richard.eckart on 24 May 2012 at 8:07

GoogleCodeExporter commented 9 years ago
Okay... so is it useful to make that clear for the uninitiated by providing 
JCasUtil.select(jcas)?

Original comment by joel.nothman@gmail.com on 24 May 2012 at 8:45

GoogleCodeExporter commented 9 years ago
I don't know. I rarely need to actually iterator over ALL stuff in the CAS. I 
mean, you also get back things like the DocumentMetaData annotation, not only 
linguistic stuff. I think there are few use-cases. I don't have strong 
objections against a JCasUtil.selectAll(jcas), but I don't think it is 
necessary and would be a real improvement.

Original comment by richard.eckart on 24 May 2012 at 12:57

GoogleCodeExporter commented 9 years ago
Does anybody have a strong desire for a selectAll(jcas) method? If there are no 
comments I'm going to close this issue as WontFix in a a couple of days.

Original comment by richard.eckart on 5 Jun 2012 at 8:48

GoogleCodeExporter commented 9 years ago
I'm ambivalent about this.  On the one hand, I think your code example above 
should suffice.  On the other hand, I can understand Joel's frustration with 
trying to navigate UIMA and uimaFIT and just trying to figure out how to do 
simple things like ask "what's in my CAS?"  As convenient as uimaFIT is for 
experienced UIMA developers, it isn't quite as accessible as it could be to 
newbies.  In fact, I can see that uimaFIT could/should be a starting point for 
new UIMA developers and so we might benefit by catering to that audience.  So, 
I lean towards adding it even if it doesn't add that much value.  

Original comment by phi...@ogren.info on 6 Jun 2012 at 3:53

GoogleCodeExporter commented 9 years ago
I didn't quite finish my thought.... So, I lean towards adding it even if it 
doesn't add that much value AND it isn't any more code than what you have above.

Original comment by phi...@ogren.info on 6 Jun 2012 at 3:54

GoogleCodeExporter commented 9 years ago
Similarly, I now appreciate that the type hierarchy makes this an easy 
operation to perform (and that except for very simple debugging tasks, its 
utility is little).

But I do think that uimafit should be promoting itself as a quick-start for 
UIMA beginners, in that it is able to hide layers of complexity through 
factories and helper functions (and XML-avoidance). Thus, the proportion of the 
UIMA documentation that is no longer prerequisite reading to do X is a very 
good measure of a feature's utility.

In this instance, a beginner who inspects the uimafit code to find that 
JCasUtil.select(jcas) === JCasUtil.select(jcas, TOP.class) has learnt a very 
quick lesson.

Original comment by joel.nothman@gmail.com on 6 Jun 2012 at 4:23

GoogleCodeExporter commented 9 years ago
Added select(CAS/JCas) in JCasUtil and CasUtil and selectFS(CAS) in CasUtil.

Original comment by richard.eckart on 5 Jul 2012 at 10:17

GoogleCodeExporter commented 9 years ago
Renamed the methods to selectAll() and selectAllFS().

Original comment by richard.eckart on 15 Jul 2012 at 8:34