AnantLabs / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
0 stars 0 forks source link

Evaluation module should not expect integer value in the id2outcome report #205

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In the new evaluation module when the data of the id2outcome report are 
processed (at least) the SingleEvaluator expects that the report uses integer 
values.

In this snippet from the SingleEvaluator starting in line 60: it takes the 
gold/actual value and makes it to an integer value:

   for (String line : readData) {
            // consists of: prediction, gold label, threshold
            // in the case of single label the threshold is ignored
            String[] splittedEvaluationData = line.split(";");
            int predictedClass = Integer.valueOf(splittedEvaluationData[0]);
            int goldClass = Integer.valueOf(splittedEvaluationData[1]);

            double oldValue = largeContingencyTable.get(goldClass).get(predictedClass);
            largeContingencyTable.get(goldClass).set(predictedClass, oldValue + 1);
        }
        return new SingleLargeContingencyTable(largeContingencyTable, class2number);

If the report is created the way that its uses integer value for gold/actual 
class it losses all its human-readability. If the integer mapping is required 
the evaluation module should create this mapping automatically. 

Original issue reported on code.google.com by Tobias.H...@gmail.com on 28 Oct 2014 at 1:35

GoogleCodeExporter commented 9 years ago
As example: the id2outcome report is expected to look like this:

#labels JJ NN  RT WRB HT URL PRP DT NNP NNS JJS UH JJR MD VPP VBD WP VBG (null) 
CC '' CD VBN RBR VBP IN WDT SYM NNPS ( ) , VB . VBZ RB PRP$ EX USR POS TO RP
10_unit0_10_12_\!=34;34
219_unit0_131_6_do=25;25
138_unit0_41_10_the=8;8

That is not readable at all anymore. This numerical gold-to-actual mapping 
should be created automatically by the evaluator in case no-numerical labels 
are used.

The report should look be allowed to look like this:
10_unit0_10_12_\!=pct;pct
219_unit0_131_6_do=V;V
138_unit0_41_10_the=DT;JJ

Original comment by Tobias.H...@gmail.com on 28 Oct 2014 at 1:42

GoogleCodeExporter commented 9 years ago
If there is demand in the future for a human-readable id2outcome report a 
respective file should be generated by the running module.

Original comment by Tobias.H...@gmail.com on 21 Nov 2014 at 3:50