Closed GoogleCodeExporter closed 9 years ago
So in this approach, you would be responsible for making calls to
SimplePipeline.run() in your train and test methods? I actually much prefer
this over the pipeline providers as I often get lost as to what the pipeline
looks like. It would be really nice if whatever pipeline I'm running in the
eval could also be shared with a standalone training main class. That isn't
possible with the existing eval, do you think it would be with this?
What would a STATS_TYPE class look like? Is it pretty much any object you want
to stuff results into? Could it work with the existing fcollections?
At first pass I like this idea. Perhaps we should test out any prototypes on
the tf*idf document classification example.
Original comment by lee.becker
on 1 May 2012 at 4:42
Yep, you call the SimplePipeline.run() yourself. It should make it much easier
to see what the pipeline really looks like.
I'm working on translating the cleartk-timeml evaluations to this paradigm. So
far, it looks good, but I'll report back when I've finished.
Original comment by steven.b...@gmail.com
on 1 May 2012 at 4:59
Oh, and the STATS_TYPE could be anything you want, but on the simplest version,
it would probably look like the EvaluationStatistics we have in the relation
extraction code.
Original comment by steven.b...@gmail.com
on 1 May 2012 at 5:01
In r3897, I committed a version of this, and in r3898 applied it to
cleartk-timeml. I've also applied it to the SHARPn relation-extractor in
cTAKES, which is really a great illustration of how much cleaner this API is.
Before:
http://ohnlp.svn.sourceforge.net/viewvc/ohnlp/branches/SHARPn-cTAKES/relation-ex
tractor/src/org/chboston/cnlp/ctakes/relationextractor/eval/RelationExtractorEva
luation.java?revision=798
After:
http://ohnlp.svn.sourceforge.net/viewvc/ohnlp/branches/SHARPn-cTAKES/relation-ex
tractor/src/org/chboston/cnlp/ctakes/relationextractor/eval/RelationExtractorEva
luation.java?revision=838
Note that in the after, I'm not showing all the other whole classes we got to
delete along with the edits in RelationExtractorEvaluation. The new APIs let us
get rid of CorpusReaderProvider_ImplBase, XMICorpusReaderProvider,
DegreeOfRelationExtractorPipelineProvider,
EntityMentionPairRelationExtractorPipelineProvider and
RelationExtractionPipelineProvider.
For me, the key difference is that it's now very clear what the training
procedure looks like and what the testing procedure looks like. In the old
version, the training procedure was partly hidden in the cleartk-eval code and
partly in the the pipeline provider defined, while the testing procedure was
partly in the classification pipeline and partly in the evaluation pipeline.
Also note how easy it is with the new API to support grid search - no need to
save and load things from odd places on the file system.
I would like to deprecate all the old cleartk-eval APIs. What do you guys think?
Original comment by steven.b...@gmail.com
on 2 May 2012 at 6:51
This issue was closed by revision r3901.
Original comment by steven.b...@gmail.com
on 3 May 2012 at 9:35
I went ahead and deprecated the old APIs. If anyone really thinks that was a
bad idea, we can back this change out.
Original comment by steven.b...@gmail.com
on 3 May 2012 at 9:36
Original comment by steven.b...@gmail.com
on 5 Aug 2012 at 8:49
Original issue reported on code.google.com by
steven.b...@gmail.com
on 1 May 2012 at 10:57