Evaluation infrastructure

GoogleCodeExporter commented 9 years ago

Folks,  ClearTK could really use some basic evaluation infrastructure for 
classifier-based annotators.  I recently implemented an interface for 
evaluation for another project for evaluating cleartk annotators.  I am really 
happy with it thus far.  Not surprising considering I have done this in my own 
sandbox several times now.  I am thinking about copying this code over to 
ClearTK or implementing something similar.  See the code in the evaluation 
package 
[http://code.google.com/p/biomedicus/source/browse/#svn/trunk/Biomedicus/src/mai
n/java/edu/umn/biomedicus/evaluation here].  

There are three interfaces: CorpusFactory, EngineFactory, and 
EvaluationFactory.  The corpus factory has methods like createTrainReader(int 
fold) and createTestReader(int fold).  The javadocs are pretty complete here.  
The EngineFactory has three methods - createTrainingAggregate, 
createClassifierAggregate, and train(directory).  The EvaluationFactory has two 
methods - createEvaluationAggregate and aggregateEvaluationResults.  The latter 
method allows you to aggregate results from e.g. 10-fold cross-validation.  
Finally, the class Evaluation provides some convenience methods for running 
cross-validation, holdout-evaluation, and training a model on an entire corpus 
for runtime use.  

This amounts to some basic scaffolding for setting up evaluation code for a 
given cleartk annotator.  The real work is done by the implementations of these 
interfaces.  There are two implementations in the project I created this for - 
one for part-of-speech tagging and another for sentence annotation.  You might 
look at either of these for an example of how it actually all works together.  

Last spring/summer I threw together some one-off code for evaluating ClearTK's 
part-of-speech tagger and never added it to ClearTK because I didn't really 
like it.  I need some evaluation code for sentence segmentation (see #99) and 
would like have something like the evaluation infrastructure I set up for 
biomedicus.  My proposal is to do something very similar to what was done 
there.  

Any thoughts you might have about your own experiences with writing evaluation 
code for cleartk annotators, your requirements, my proposal etc. are 
appreciated.

Original issue reported on code.google.com by pvogren@gmail.com on 1 Dec 2010 at 6:41

GoogleCodeExporter commented 9 years ago

A big +1 for some kind of support for training/testing and cross-validation. 
Some comments on your specific approach:

* CorpusFactory has a lot of methods. We should probably have a CorpusFactory 
subclass that just requires a few pieces of information, e.g. the names of the 
training files, the names of the testing files, and the number of folds the 
training files should be split into. That should be enough info to fill in the 
bodies of all those methods.

* EngineFactory looks fine.

* EvaluationFactory is okay, I guess, but the interface of having to write 
evaluation results to files and then read them back in seems a little clunky. 
But I guess there's no real output of a UIMA pipeline except files, so there's 
no obvious way to keep those evaluation results in memory. At the very least, 
we should supply some default subclasses for the common case of accuracy and 
f-score evaluations.

As a side note, the lines in a few places in your code that look like:

    if(fold < 10) {
        foldDirectory = new File(outputDirectory, "fold0"+fold);

need fixed for folks who use more than 10 folds. Something like:

    String format = String.format("fold%%0%dd", Math.ceil(Math.log10(folds)));
    foldDirectory = new File(outputDirectory, String.format(format, fold));

Original comment by steven.b...@gmail.com on 1 Dec 2010 at 9:46

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I suppose we could have two CorpusFactory interfaces for which the _ImplBase 
would implement both.  I'm indifferent about this.  

Yeah - the output directory in EvaluationFactory - is not super elegant but I 
thinks its the best we can do here.  

I'll try out the directory name formatting.  looks cool!

Any thoughts about where I should put this evaluation code?  I am thinking 
either in cleartk-ml or in a separate project cleartk-ml-evaluation with an 
initial preference for the former.

Original comment by pvogren@gmail.com on 2 Dec 2010 at 6:57

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I think cleartk-ml is fine. Anyone who is using ClearTK for ML seriously will 
want evaluation code.

Original comment by steven.b...@gmail.com on 3 Dec 2010 at 8:24

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

The cleartk-eval package provides this functionality.

Original comment by steven.b...@gmail.com on 12 Feb 2012 at 4:28

Changed state: Fixed
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 5 Aug 2012 at 8:58

Added labels: Milestone-1.1
Removed labels: ****

fangfangli / cleartk

Evaluation infrastructure #172