Separate reading logic from label annotation in TC readers

AnantLabs / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc

Other

0 stars 0 forks source link

Separate reading logic from label annotation in TC readers #188

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

Currently, TC readers are responsible for reading content, creating the CAS and 
assigning appropriate labels (outcomes).
As discussed in today's meeting, this should be separated, so that plain DKPro 
Core readers from the IO model can be reused as-is. To realize that, we need to 
add an annotator (either by default or as configurable parameter) which 
converts existing annotations (e.g. POS labels in case of the 
de.tudarmstadt.ukp.dkpro.tc.examples.io.BrownCorpusReader) into 
ClassificationOutcomes.

Original issue reported on code.google.com by daxenber...@gmail.com on 7 Oct 2014 at 1:31

GoogleCodeExporter commented 9 years ago

This is a very good move.

One question: Is that in addition to the old mode or should it replace it?
We have some kinds of data where the label might not be easily extracted at a 
later stage in the pipeline.

Original comment by torsten....@gmail.com on 7 Oct 2014 at 2:03

GoogleCodeExporter commented 9 years ago

This will replace the old "mode" (i.e. the way we use readers).
As suggested by Richard, we should keep the "Label Annotator" flexible, so that 
for more complicated cases (e.g. when the classification outcome should be a 
combination of annotations) the user is able to customize.

Original comment by daxenber...@gmail.com on 7 Oct 2014 at 2:24

GoogleCodeExporter commented 9 years ago

Original comment by daxenber...@gmail.com on 11 Dec 2014 at 3:46

Added labels: Milestone-Release0.8.0
Removed labels: Milestone-Release0.7.0