Why overfitting train yields better scores?

awslabs / gap-text2sql

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

https://arxiv.org/abs/2012.10309

Apache License 2.0

102 stars 25 forks source link

Why overfitting train yields better scores? #10

Closed JoaoLages closed 3 years ago

JoaoLages commented 3 years ago

I've tried to retrain your model and managed to get the same scores as you report. What I don't understand is:

there is not early stopping
the model used is the last saved model checkpoint, when the model is with huge overfit on train (very close to 0 loss)

However, I've tried to get scores on another model checkpoint with a lower loss in validation and it yielded worse EXACT matches. Therefore, why does it yield better scores if we overfit the training set?

Impavidity commented 3 years ago

So basically to my observation, there is no direct connection between the exact set match and the validation loss. Because validation loss is based on the token level (decoding steps), while exact match is based on the whole sequence. We basically can not tell which checkpoint is better only relying on the validation loss.

Hope this is helpful to you.

JoaoLages commented 3 years ago

Yes, makes sense. However it is still a bit strange to me how the overfitted model gets picked 😅