adding "evaluate_on_test": true

akkikiki commented 5 years ago

SUMMARY: The current config does not output "test_UAS", "test_LAS", "test_UEM", "test_LEM", "test_loss" into the standard output or to metrics.json even "test_data_path" is specified in the config file. Following another example in the AllenNLP config, the solution is to simply add "evaluate_on_test": true to the config.

TEST: Ran allennlp train $CONFIG -s $OUTPUT_DIR, where $CONFIG is borrowed and slightly modified from the test config in AllenNLP.

Example output of metrics.json without "evaluate_on_test": true (i.e., the current config)

{
"best_epoch": 0,
"peak_cpu_memory_MB": 203.681792,
"training_duration": "0:00:00.155942",
"training_start_epoch": 0,
"training_epochs": 0,
"epoch": 0,
"training_UAS": 0.1038961038961039,
"training_LAS": 0.012987012987012988,
"training_UEM": 0.0,
"training_LEM": 0.0,
"training_loss": 5.463512897491455,
"training_cpu_memory_MB": 199.585792,
"validation_UAS": 0.175,
"validation_LAS": 0.0,
"validation_UEM": 0.0,
"validation_LEM": 0.0,
"validation_loss": 5.561254024505615,
"best_validation_UAS": 0.175,
"best_validation_LAS": 0.0,
"best_validation_UEM": 0.0,
"best_validation_LEM": 0.0,
"best_validation_loss": 5.561254024505615
}

Example output of metrics.json with "evaluate_on_test": true

{
"best_epoch": 0,
"peak_cpu_memory_MB": 203.362304,
"training_duration": "0:00:00.139146",
"training_start_epoch": 0,
"training_epochs": 0,
"epoch": 0,
"training_UAS": 0.1038961038961039,
"training_LAS": 0.012987012987012988,
"training_UEM": 0.0,
"training_LEM": 0.0,
"training_loss": 5.463512897491455,
"training_cpu_memory_MB": 199.172096,
"validation_UAS": 0.175,
"validation_LAS": 0.0,
"validation_UEM": 0.0,
"validation_LEM": 0.0,
"validation_loss": 5.561254024505615,
"best_validation_UAS": 0.175,
"best_validation_LAS": 0.0,
"best_validation_UEM": 0.0,
"best_validation_LEM": 0.0,
"best_validation_loss": 5.561254024505615,
"test_UAS": 0.175,
"test_LAS": 0.0,
"test_UEM": 0.0,
"test_LEM": 0.0,
"test_loss": 5.561254024505615
}

TalSchuster commented 5 years ago

Thanks for the PR! In general, as Matt Gardner once told me and I agree, they've decided not to include this flag in the example configurations since it's not good practice to extract the test results while developing / hyperparameter tuning a model. However, since this configuration is mainly for the purpose of reproducing, I agree that it should be there.

akkikiki commented 5 years ago

Oh, that makes sense. Thanks for merging it anyway!

TalSchuster / CrossLingualContextualEmb

adding "evaluate_on_test": true #10