Closed AdamVVG closed 5 years ago
You are right. Since the model is BERT and the recommended epoch is 3 or 4, we just fix the epoch as 4 and output the results of the test set when training. We take the one of epoch-4 to do evaluation. You can also first do eval on dev set and choose the model that performs best on it.
By the way, at the beginning of our experiment, we do evaluate on dev set for observation and it is undoubtedly necessary. However, in the subsequent experiments, since we have fixed the number of epoch as 4, for convenience, we directly output the results of the test set when training.
I have tried implementing your code and it seems to work fine. There is however one issue I think... You have a dev set from the sentihood-dev.json file which you never seem to use. You specify a function, get_dev_examples, in the processor.py file but you never use it in either run_classifier_TABSA.py or evaluation.py. As far as I understand, you use the test set for both evaluation during and after training. Is this right or am I missing something?