google-code-export / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
1 stars 0 forks source link

Documentation clarification: Which readers are used in CV? #109

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Several TC demos currently demonstrate both Train/Test and CV.  For Train/Test, 
it is necessary to have a Reader for the training data and a Reader for the 
test data, but it's not clear what happens to the two readers in CV.  Are both 
of them used for CV?  Just the first?  Just the second?

If both are used, then it is quite a problem if a user debugs an experiment on 
a reduced-size corpus in Train/Test mode and then points the Training Reader to 
the full corpus for CV final experiments, and just switches between TrainTest 
and CV in the main experiment method, while leaving the debugging corpus in 
place for the Test Reader (or vice versa).  I am aware of multiple users doing 
this, assuming TC handles the situation intelligently, and depending on 
experiment set-up, it means there may be information leak in the final 
experiments.

If only one is used, then it is unclear which one.  CV on "DIM_READER_TRAIN"?  
CV on "DIM_READER_TEST"?  If the user was debugging the experiment with 
similar-sized-but-different corpora in each of the two readers, then switching 
to CV for final evaluation might accidentally evaluate on the wrong corpus.

Original issue reported on code.google.com by EmilyKJa...@gmail.com on 23 Mar 2014 at 12:18

GoogleCodeExporter commented 9 years ago
CV only uses the train reader.

Original comment by torsten....@gmail.com on 23 Mar 2014 at 8:04

GoogleCodeExporter commented 9 years ago
It looks like in a java experiment, such as TwentyNewsgroups, the Readers are 
labelled (DIM_READER_TRAIN, etc).  But I do not see any labelling in the Groovy 
TwentyNewsgroups when dimReaders is created.  Does the TC interpretation of 
which reader is which, depend on the order of the readers in the list?

Original comment by EmilyKJa...@gmail.com on 23 Mar 2014 at 8:56

GoogleCodeExporter commented 9 years ago
in Groovy ...
readerTrain:
readerTest:

Original comment by torsten....@gmail.com on 23 Mar 2014 at 8:58

GoogleCodeExporter commented 9 years ago
Thanks, I missed that.
We should add documentation to the demos (or elsewhere: see Issue 110) to 
explain that CV only uses the train reader, and ignores the test reader.

Original comment by EmilyKJa...@gmail.com on 23 Mar 2014 at 9:04

GoogleCodeExporter commented 9 years ago
I have added some documentation in the code.
Additional documentation should be added to the docbook.

Original comment by torsten....@gmail.com on 25 Apr 2014 at 9:48

GoogleCodeExporter commented 9 years ago

Original comment by daxenber...@gmail.com on 13 Jun 2014 at 3:18