TalSchuster / CrossLingualContextualEmb

Cross-Lingual Alignment of Contextual Word Embeddings
MIT License
98 stars 9 forks source link

adding "evaluate_on_test": true #10

Closed akkikiki closed 5 years ago

akkikiki commented 5 years ago

SUMMARY: The current config does not output "test_UAS", "test_LAS", "test_UEM", "test_LEM", "test_loss" into the standard output or to metrics.json even "test_data_path" is specified in the config file. Following another example in the AllenNLP config, the solution is to simply add "evaluate_on_test": true to the config.

TEST: Ran allennlp train $CONFIG -s $OUTPUT_DIR, where $CONFIG is borrowed and slightly modified from the test config in AllenNLP.

  1. Example output of metrics.json without "evaluate_on_test": true (i.e., the current config)

    {
    "best_epoch": 0,
    "peak_cpu_memory_MB": 203.681792,
    "training_duration": "0:00:00.155942",
    "training_start_epoch": 0,
    "training_epochs": 0,
    "epoch": 0,
    "training_UAS": 0.1038961038961039,
    "training_LAS": 0.012987012987012988,
    "training_UEM": 0.0,
    "training_LEM": 0.0,
    "training_loss": 5.463512897491455,
    "training_cpu_memory_MB": 199.585792,
    "validation_UAS": 0.175,
    "validation_LAS": 0.0,
    "validation_UEM": 0.0,
    "validation_LEM": 0.0,
    "validation_loss": 5.561254024505615,
    "best_validation_UAS": 0.175,
    "best_validation_LAS": 0.0,
    "best_validation_UEM": 0.0,
    "best_validation_LEM": 0.0,
    "best_validation_loss": 5.561254024505615
    }
  2. Example output of metrics.json with "evaluate_on_test": true

    {
    "best_epoch": 0,
    "peak_cpu_memory_MB": 203.362304,
    "training_duration": "0:00:00.139146",
    "training_start_epoch": 0,
    "training_epochs": 0,
    "epoch": 0,
    "training_UAS": 0.1038961038961039,
    "training_LAS": 0.012987012987012988,
    "training_UEM": 0.0,
    "training_LEM": 0.0,
    "training_loss": 5.463512897491455,
    "training_cpu_memory_MB": 199.172096,
    "validation_UAS": 0.175,
    "validation_LAS": 0.0,
    "validation_UEM": 0.0,
    "validation_LEM": 0.0,
    "validation_loss": 5.561254024505615,
    "best_validation_UAS": 0.175,
    "best_validation_LAS": 0.0,
    "best_validation_UEM": 0.0,
    "best_validation_LEM": 0.0,
    "best_validation_loss": 5.561254024505615,
    "test_UAS": 0.175,
    "test_LAS": 0.0,
    "test_UEM": 0.0,
    "test_LEM": 0.0,
    "test_loss": 5.561254024505615
    }
TalSchuster commented 5 years ago

Thanks for the PR! In general, as Matt Gardner once told me and I agree, they've decided not to include this flag in the example configurations since it's not good practice to extract the test results while developing / hyperparameter tuning a model. However, since this configuration is mainly for the purpose of reproducing, I agree that it should be there.

akkikiki commented 5 years ago

Oh, that makes sense. Thanks for merging it anyway!