Closed GoogleCodeExporter closed 9 years ago
Hi,
the 'headline' labels slipped into the label list. This is indeed a bug.
The change to this numerical values was implemented to account for the new
evaluation module which expects to receive numerical values from the
id2outcome.txt
I was not aware that there is another file which assigns already numerical
values to the labels separately.
It is probably the best idea to the read the outcome-mapping file instead of
compiling one mapping in the report.
As far as I can see that from the demo, the outcome-mapping of train/test is
(always?) the same? It doesn't matter which file I would read, right?
Original comment by Tobias.H...@gmail.com
on 18 Dec 2014 at 8:13
I'm not really sure about the TC policy for determining this mapping. It seems
that the class names are just sorted ... similar to the method
SmallContingencyTables.classNamesToMapping(..) Is that right?
If so, it might also make sense to sort the labels in the id2outcome.txt file
for preventing confusion of the users or maybe even better to include the
mapping in the outcome file?
Original comment by christia...@googlemail.com
on 18 Dec 2014 at 8:20
I wonder where else such a mapping is done. This looks like code duplication to
me. The Feature extraction does it, the machine learning adapter does it and
the evaluation module does it too.
Maybe this mapping should move to somewhere else as it becomes more important
with the new evaluation module?
Original comment by Tobias.H...@gmail.com
on 18 Dec 2014 at 11:06
The classlabel-to-number mapping in the id2outcome.txt (TestTask) and in
outcome-mapping.txt (ExtractFeaturesTask) should be independent. There is no
guarantee that they return the same mapping. Furthermore, the
outcome-mapping.txt produced during training and testing can be different if
there is a different set of classlabels in the train and test set.
I'm not sure what exactly outcome-mapping.txt is used for. Maybe Torsten can
help here.
I would opt not to mix the 2 mappings. Rather, I would suggest to make the
mapping in id2outcome.txt explicit, i.e. instead of
#ID=PREDICTION;GOLDSTANDARD
#labels NPg JJ RB PPS TO ...
we should have
#ID=PREDICTION;GOLDSTANDARD
#labels 1=NPg 2=JJ 3=RB 4=PPS 5=TO
This needs to be fixed within the evaluation module (I'll open a separate
issue).
The problem originally addressed in this issue is a bug in one of the CRFSuite
reports and should be fixed there.
Original comment by daxenber...@gmail.com
on 18 Dec 2014 at 3:02
The last time I touched this, labels were still strings. So I cannot really
help here.
Original comment by torsten....@gmail.com
on 18 Dec 2014 at 3:07
Ok, I filter out the two wrong-labels and update the report as suggest in #4
Original comment by Tobias.H...@gmail.com
on 18 Dec 2014 at 3:33
Original comment by Tobias.H...@gmail.com
on 18 Dec 2014 at 4:11
Original issue reported on code.google.com by
christia...@googlemail.com
on 17 Dec 2014 at 4:34