Create a dedicated evaluation object/module

daxenberger commented 9 years ago

Originally reported on Google Code with ID 50

Some reports might produce mandatory output which other tasks or reports (potentially)
depend on. This is the case for some of the reports which create the results of the
test task (f-scores etc.). We should think about moving this (core) functionality to
their an own task.

Reported by daxenberger.j on 2013-09-17 16:41:45

Edit by Tobias Horsmann - ToDo list:

[x] make sure that each MLAdapter/Framework creates id2outcome files (TestTasks)
[x] delete old (ML framework dependent) results reports (TestTasks)
[x] adapt all higher(batch)-level reports to work with new evaluation mode (in particular, Issue #113)
[x] switch all demos to new evaluation mode
[x] implement missing measures (the wish list is probably long, but for the beginning, the most important ones would probably be related to regression, which are completely missing atm)

daxenberger commented 9 years ago

Agreed.  See comments on r704.
I am particularly interested in moving the calculations out of BatchCrossValidationReport,
and into an Object that all Reports can access.

Reported by EmilyKJamison on 2014-03-21 18:25:17

daxenberger commented 9 years ago

A task may be a bit heavy, but creating some separate classes for data structures commonly
used in reports and for evaluations over these data structures appears sensible. The
structures and functionality could easily be reused and the reports would become more
light-weight.

Reported by richard.eckart on 2014-03-23 12:44:03

daxenberger commented 9 years ago

We need to create a central Evaluation Object in TC, which will serve as a connector
between the machine learning framework and the evaluation (i.e. all reports). 

Why?
1) This Evaluation Object can be used to create a file in a format which can be imported
into an external program to do significance tests (issue 112).
2) To avoid bias in the aggregation of results from CV folds, an overall confusion
matrix should be created which is used to further calculate F1 etc. An Evaluation Object
can also hold the overall confusion matrix (issue 113).

Reported by daxenberger.j on 2014-04-22 15:42:58

Labels added: Priority-High
Labels removed: Priority-Low

daxenberger commented 9 years ago

Reported by daxenberger.j on 2014-04-22 15:48:43

Blocking: #113

daxenberger commented 9 years ago

Reported by daxenberger.j on 2014-04-22 15:49:15

Blocking: #112

daxenberger commented 9 years ago

Reported by daxenberger.j on 2014-06-04 12:34:49

Labels added: Milestone-Release0.7.0

daxenberger commented 9 years ago

Reported by daxenberger.j on 2014-09-05 08:45:17

Status changed: Started

daxenberger commented 9 years ago

This issue was updated by r1133 and r1134.

Reported by daxenberger.j on 2014-10-08 09:12:56

daxenberger commented 9 years ago

This issue was updated by revision r1136.

tests for soft/strict evaluation.

Reported by daxenberger.j on 2014-10-08 09:48:59

daxenberger commented 9 years ago

This issue was updated by revision r1137.

adding TODOs.

Reported by daxenberger.j on 2014-10-08 10:00:34

daxenberger commented 9 years ago

This issue was updated by revision r1289.

introducing a generic multi-label result wrapper to work with the latest version of
meka and DKPro TC's new evaluation module

Reported by daxenberger.j on 2014-12-10 08:54:40

daxenberger commented 9 years ago

This issue was updated by revision r1359.

Created a test for the new evaluation report, in part. outcome id report

Reported by daxenberger.j on 2015-03-17 11:47:34

daxenberger commented 9 years ago

This issue was updated by revision r1366.

Expanding functionality of the evaluation module/helper classes.
Work in progress.

Reported by daxenberger.j on 2015-03-17 16:16:36

daxenberger commented 9 years ago

Reported by daxenberger.j on 2015-03-27 13:05:34

Labels added: Milestone-Release0.8.0
Labels removed: Milestone-Release0.7.0

daxenberger commented 9 years ago

Good reference for multi-label evaluation and calculation of scores: http://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics

Reported by daxenberger.j on 2015-03-30 08:24:47

daxenberger commented 9 years ago

This issue was updated by revision r1396.

restructuring evaluation module; mostly multi-label part
several tests are broken/ignored atm, needs to be investigated

Reported by daxenberger.j on 2015-04-02 16:04:35

daxenberger commented 9 years ago

This issue was updated by revision r1400.

enabling crossvalidation setup with evaluation module
adding test for crossvalidation setup with evaluation module

Reported by daxenberger.j on 2015-04-03 15:09:25

daxenberger commented 9 years ago

This issue was updated by revision r1404.

copy all relevant discriminators into CV report on batch task level;
javadoc

Reported by daxenberger.j on 2015-04-07 14:05:51

daxenberger commented 9 years ago

This issue was updated by revisions r1484 and r1485.
Added new measures; corrected calculation of multi-label scores; documentation

Reported by daxenberger.j on 2015-05-18 10:25:04

Horsmann commented 8 years ago

@daxenberger Is there anything left to do or should we move this one to 0.9.0 ?

daxenberger commented 8 years ago

move to the next milestone. this hasn't been tested and integrated properly yet.

Horsmann commented 8 years ago

If you tell me what is left to do or where to start I would continue integration. Which is the next step here?

daxenberger commented 8 years ago

A rough roadmap:

[x] make sure that each MLAdapter/Framework creates id2outcome files (TestTasks)
[x] delete old (ML framework dependent) results reports (TestTasks)
[ ] adapt all higher(batch)-level reports to work with new evaluation mode (in particular, Issue #113)
[x] switch all demos to new evaluation mode
[x] implement missing measures (the wish list is probably long, but for the beginning, the most important ones would probably be related to regression, which are completely missing atm)

Horsmann commented 8 years ago

@daxenberger I started a new branch making SvmHmm my guinea pig. Svmhmm outputs some own confusion matrix stuff. This can be removed now, too? I am not sure if it is even correct since this also untested but it should be handled by the new evaluation module now anyway?

Except for the additional files SvmHmm creates the integration is not soo hard. I can remove SVMHMMClassificationReport and SVMHMMBatchCrossValidationReport once the interface is adapted, right?

daxenberger commented 8 years ago

[...] but it should be handled by the new evaluation module now anyway?

yes, in theory. I'm not sure how well this works atm, so maybe deprecate the old reports rather than removing them completely.

I can remove SVMHMMClassificationReport and SVMHMMBatchCrossValidationReport once the interface is adapted, right?

Some here: I'd prefer to remove them from demos etc., but instead of completely deleting them, better deprecate.

Horsmann commented 8 years ago

What do we do with the Mallet module. It is deprecated since 0.7.0. All changes to the API have to implemented there too. Maybe its time to remove it entirely? Who ever needs Mallet should either use 0.7.0 or 0.8.0 ?

reckart commented 8 years ago

Why was it deprecated?

Horsmann commented 8 years ago

I think the code is a bit messy some things never seemed to have worked (sequence classification) and we have Weka and Crfsuite which covers everything. If someone wants to revive it it should be reimplemented from scratch imo.

zesch commented 8 years ago

Very slow. Was not really usable for real problems. Tobias Horsmann notifications@github.com schrieb am Sa., 7. Mai 2016 um 13:33:

I think the code is a bit messy some things never seemed to have worked (sequence classification) and we have Weka and Crfsuite which covers everything. If someone wants to revive it it should be reimplemented from scratch imo.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-tc/issues/50#issuecomment-217630216

Horsmann commented 8 years ago

@daxenberger all MLA are supposed to have as last entry threshold in the id2outcome file? What value do I set if the MLA doesn't has such a parameter?

daxenberger commented 8 years ago

yeah that is necessary to have a common format for all kind of learning modes. theshold will be ignored if multi-labeling is not applied and should thus set be set to -1 (or 0).

Horsmann commented 8 years ago

@daxenberger ok thx. Do we adapt Mallet too? Maybe its time to remove it entirely?

zesch commented 8 years ago

How much effort is it to drag it along? Tobias Horsmann notifications@github.com schrieb am So., 8. Mai 2016 um 12:51:

@daxenberger https://github.com/daxenberger ok thx. Do we adapt Mallet too? Maybe its time to remove it entirely?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-tc/issues/50#issuecomment-217708964

Horsmann commented 8 years ago

About the same as for the other modules - a have day. I am more concerned to keep maintaining a module that is dead since 2 releases. Changes have been reflected for Mallet so far if they affected interfaces and made it mandatory to change it there too (otherwise Jenkins would fail) but there are no unit tests for the module. I find its whole state a bit questionable imo it is rotting code that causes effort in every change and no one(?) has any advantage of the effort spend. So I would suggest to remove it :)

zesch commented 8 years ago

Ok for me Tobias Horsmann notifications@github.com schrieb am So., 8. Mai 2016 um 13:31:

About the same as for the other modules - a have day. I am more concerned to keep maintaining a module that is dead since 2 releases. Changes have been reflected for Mallet so far if they affected interfaces and made it mandatory to change it there too (otherwise Jenkins would fail) but there are no unit tests for the module. I find its whole state a bit questionable imo it is rotting code that causes effort in every change and no one(?) has any advantage of the effort spend. So I would suggest to remove it :)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-tc/issues/50#issuecomment-217710632

reckart commented 8 years ago

Might be worth noting that there is a new version of mallet since a few days:

http://search.maven.org/#artifactdetails%7Ccc.mallet%7Cmallet%7C2.0.8%7Cjar

https://github.com/mimno/Mallet

Horsmann commented 8 years ago

What I have done so far

[x] add Liblinear outcome2id report
[x] add Crfsuite outcome2id report
[x] update (non-groovy) examples/test to use new evaluation module
[x] delete old module-depended ClassificationReports
[x] make *usingTCEvaluationReport reports main reports BatchTrainTestReport and BatchCrossvalidationReport

Ok here a list of issues I am not sure how to handle

[x] I can't fix the Groovy Test cases - support is not available for my Eclipse version @daxenberger can you fix those?
[x] WekaRegressionExperimentTest triggers an exception when calling a getLabel() method - no regression implemented yet?
[x] Mallet module - see previous postings - is untouched at the moment
[x] Weka has various other *Adapter classes with additionally reports - how to handle those?

reckart commented 8 years ago

I can't fix the Groovy Test cases - support is not available for my Eclipse version @daxenberger can you fix those?

For curiosity, what Eclipse version are you using?

Horsmann commented 8 years ago

4.4 Luna this one isn't working https://marketplace.eclipse.org/content/groovygrails-tool-suite-ggts-eclipse

reckart commented 8 years ago

I have been using Luna before and now I am using Mars.

For Luna: http://dist.springsource.org/release/GRECLIPSE/e4.4/ For Mars: http://dist.springsource.org/snapshot/GRECLIPSE/e4.5/

Horsmann commented 8 years ago

sry, not working I tried those too. Installation fails with an exception.

Horsmann commented 8 years ago

I installed another Eclipse version - I don't understand was those Groovy test cases are supposed to do which makes it hard to fix it. I could need a hand for those... @daxenberger help wanted

daxenberger commented 8 years ago

I can't fix the Groovy Test cases - support is not available for my Eclipse version @daxenberger can you fix those?

I'm compiling the groovy demos without problems under Eclipse Mars (4.5.2) with Groovy Compiler (1.8-2.4) 2.9.2.xx

WekaRegressionExperimentTest triggers an exception when calling a getLabel() method - no regression implemented yet?

See my reponse on the mailing list - no getLabel() method for regression.

Mallet module - see previous postings - is untouched at the moment

I am a bit reluctant to totally remove the module since Mallet has become more active recently (see above). Maybe move the module into it's own branch?

Weka has various other *Adapter classes with additionally reports - how to handle those?

The Prediction adapters are not needed anymore, since we have Save/LoadModel now. Statistics adapters are already ported to the new evaluation mode. Meka and Weka needs to be distinguished due to differences in single/multi-label mode.

daxenberger commented 8 years ago

I installed another Eclipse version - I don't understand was those Groovy test cases are supposed to do which makes it hard to fix it. I could need a hand for those... @daxenberger help wanted

which tests do you refer to? The tests in the groovy demo module to the same as in the java module: simply execute the demos.

Horsmann commented 8 years ago

the PairTwentyNewsgroupsDemo fails with a Stanford triggered ClassCastException on my machine? The regression demo should fail too due to the label() issue.

Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to [Ledu.stanford.nlp.util.Index;
    at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2164)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1249)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1226)
    at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2278)
    at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordNamedEntityRecognizer$1.produceResource(StanfordNamedEntityRecognizer.java:170)
    at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordNamedEntityRecognizer$1.produceResource(StanfordNamedEntityRecognizer.java:141)```

Horsmann commented 8 years ago

@daxenberger Regarding mallet: I can fork-off a new branch from master with the current state. I would then remove the mallet module from my issue50 branch which removes the module once it is merged into master

daxenberger commented 8 years ago

the PairTwentyNewsgroupsDemo fails with a Stanford triggered ClassCastException on my machine?

ok; I'll have a look at this

daxenberger commented 8 years ago

Regarding mallet: I can fork-off a new branch from master with the current state. I would then remove the mallet module from my issue50 branch which removes the module once it is merged into master

sounds good

Horsmann commented 8 years ago

Oookay. I set up a Jenkins job and it seems Jenkins does not have the problems I have with the Groovy experiments. I think I am quite close to merging this branch into master. Seemingly everything is working. I added 2 of the most easy to implement regression measures for a few simple tests. The other bugs will probably only show themselves when we are actually start using the module.

Horsmann commented 8 years ago

I merged the changes into master - an open todo on the checklist is the measure implementations (for regression).

@daxenberger I have the WekaFeatureValuesReport as left-over. The report used the former result.txt if this report is still needed you would have to upgrade it to using the new module. Not really sure what it is suppose to do.

Maybe we start an own issue for adding the remaining measures?

dkpro / dkpro-tc

Create a dedicated evaluation object/module #50