The metrics of training stage are not the same as they are on the evaluation stage

jarork commented 3 years ago

I have a rel_f1 at 0.52 and an ent_f1 at 0.73 on my training stage, where a checkpoint was saved; however, it says that the f1 values on my test set are only 0.25 and 0.5 (with the saved checkpoint).

I've tried to replace the test set with the same valid set that I used for training, but the evaluation scores are still much lower than the training metrics. (rel_f1 = 0.3, ent_f1 = 0.55).

what's more, I notice that the system create time of the checkpoint is totally the same as the wandb wall time when this checkpoint was saved. -> It's likely I'm not using a wrong checkpoint.

How do you get the accurate precision, recall and f1 score on evaluation stage? Thanks

131250208 commented 3 years ago

I had never come into this situation. It seems like a problem of the configuration. You could check the settings for evaluation to make sure that all super-parameters are the same as the training stage except for the batch_size and max_seq_length.

jarork commented 3 years ago

I had never come into this situation. It seems like a problem of the configuration. You could check the settings for evaluation to make sure that all super-parameters are the same as the training stage except for the batch_size and max_seq_length.

Thank you mate, the problem has been perfectly solved.

131250208 / TPlinker-joint-extraction

The metrics of training stage are not the same as they are on the evaluation stage #42