Model selection in train.py

lizichao7 commented 4 years ago

Hi,

I have a question about how the best model is selected in train.py, and I hope to get some clarification from the authors. Thanks in advance.

By looking at the code, my understanding is that you are training for 200 epochs, and for each epoch you are evaluating the model on test set, and in the end you are reporting the best accuracy of the best model among 200 models from 200 epochs. Is my understanding correct? If so, then I think this procedure is problematic. I think the correct way to do model selection is to select the best model based on the performance on the validation set, and evaluate the selected model's performance on the test set once and report the result. Your procedure overestimates the performance because you are evaluating every model on the test set and selecting the best result.

Just to be clear, I think the proposed method in your paper is novel and valuable. I also really appreciate your effort of publicly releasing the code. I am asking this question because I am doing a follow-up work of your paper, and I would like the performance comparison of different methods to be fair and accurate. If the authors agree with my statement, I hope the authors could fix the problem, update the code and result, and put an updated version of the paper on arXiv. Of course, if my understanding is incorrect, I would really appreciate any clarification.

Magicat128 commented 4 years ago

Hi @lizichao7

That's a good point. Sure, the result should be reported according to the validation performance. The test result here gives you an overall understanding of the model and is not the real performance. Just fixed the code in case of any confusion. Anyway, thank you for pointing that out.

lizichao7 commented 4 years ago

Ok. Thanks for the clarification.

CRIPAC-DIG / TextING

Model selection in train.py #5