I think that there is an error in train(). Evaluate() at the end of the training uses the last trained model, not the best one according to the dev set. The results are probably very similar though.
The best model is actually saved but one has to run the code with --type eval --input_file "best_model" to get the actual accuracy on the eval set.
I think that there is an error in train(). Evaluate() at the end of the training uses the last trained model, not the best one according to the dev set. The results are probably very similar though.
The best model is actually saved but one has to run the code with --type eval --input_file "best_model" to get the actual accuracy on the eval set.