Closed Xiaoxiong-Liu closed 4 years ago
The dev set is used in order to tune the number of training epochs and inference steps as mentioned in the paper. After you find the best epoch on the dev, you then need to merge train+dev set and re-train the model on this union until the best epoch and evaluate on the test set.
In essence you need to modify the config file accordingly in order to get the correct performance. Due to changes in the environments however, the results might not be exactly the same.
I added a few more details related to that, hope it's all clear now!
It seems that actual test dataset is not used in the experiment.
In configs/parameters_cdr.yaml, dev_filter.data is named as test file and dev_filter.data is used both in training and testing procedure. In other words, you test the model in dev dataset.
Most of all, the performence is not good enough as you report in paper. In dev dataset:
Loading model & parameters ... TEST | LOSS = 0.46038, ACC = 0.8370 , MICRO P/R/F1 = 0.5810 0.6482 0.6128 | TP/ACTUAL/PRED = 656 /1012 /1129 , TOTAL 5087 | 0h 00m 11s
And in actual test dateset:
TEST | LOSS = 0.47821, ACC = 0.8263 , MICRO P/R/F1 = 0.5732 0.5947 0.5838 | TP/ACTUAL/PRED = 634 /1066 /1106 , TOTAL 5204 | 0h 00m 11s
reported in your paper:
Is there anthing wrong?