Add preProcessingPipeline into CleartkPipelineProvider

GoogleCodeExporter commented 9 years ago

[Lee] 
What is the best way to do analysis of the entire training set prior 
to feature extraction?  I would like to compute some statistics like 
the distribution of class labels, n-grams probabilities, etc. for 
later use during feature extraction. 

[Philip]
Unfortunately, our evaluation pipeline doesn't allow for this kind of 
preprocessing of the corpus.  It seems like maybe we need to add another 
method to the interface?  This would make sense if you want to calculate the 
various metrics for each fold before you train the model. 

[Philip]
Ok.  Feel free to propose an extension to the evaluation interfaces. 
 Perhaps you need a new method in CleartkPipelineProvider?  Maybe something 
like 
public List<AnalysisEngine> getPreprocessingPipeline(String name) throws 
UIMAException;

Original issue reported on code.google.com by lee.becker on 16 Feb 2011 at 11:48

GoogleCodeExporter commented 9 years ago

Lee - another possibility is to write your own version of Evaluation.java that 
does this instead.  Just a thought.

Original comment by phi...@ogren.info on 20 Feb 2011 at 3:20

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Fixed in r3901 with the new evaluation APIs from Issue 304. You can now have 
whatever preprocessing you want in the train method.

Original comment by steven.b...@gmail.com on 3 May 2012 at 9:39

Changed state: Fixed
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 5 Aug 2012 at 8:48

Added labels: Component-eval, Milestone-1.2
Removed labels: ****

fangfangli / cleartk

Add preProcessingPipeline into CleartkPipelineProvider #230