Closed SamuelLarkin closed 4 years ago
Hi,
guess I had to open a issue to see the error staring at me right in the face. I used --maximize-best-checkpoint-metric
that is why. Too much copy&paste from examples as you learn a new tool :$
I don't get why we would need to specify max/min when this should easily be inferred from the metric/criterion itself?
Yep, glad this is resolved :)
I don't get why we would need to specify max/min when this should easily be inferred from the metric/criterion itself?
Good point. I've created #1912 to track this.
🐛 Bug
"valid_best_loss" reported in the logs is not the best but rather the value of the first epoch.
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
Note that the
valid_best_lost
at epoch 401 is the same asvalid_loss
at epoch 1 and stays like that through out training. Looks like it should be better when getting smaller.Code sample
Expected behavior
valid_best_lost
should go down when the trainingcriterion
islabel_smoothed_cross_entropy
as such, it should not be stuck reflecting the value at epoch 1. In the example above, it should at least be3.580
Environment
Installation procedure
Resulting in
Additional context
Also, the best checkpoint is not the right one since the best checkpoint is the checkpoint produced at epoch 1.