Closed JoaoLages closed 3 years ago
So basically to my observation, there is no direct connection between the exact set match and the validation loss. Because validation loss is based on the token level (decoding steps), while exact match is based on the whole sequence. We basically can not tell which checkpoint is better only relying on the validation loss.
Hope this is helpful to you.
Yes, makes sense. However it is still a bit strange to me how the overfitted model gets picked 😅
I've tried to retrain your model and managed to get the same scores as you report. What I don't understand is:
However, I've tried to get scores on another model checkpoint with a lower loss in validation and it yielded worse EXACT matches. Therefore, why does it yield better scores if we overfit the training set?