Appropriate training steps for fine-tuning on language model

google-research / bert

TensorFlow code and pre-trained models for BERT

Apache License 2.0

38.15k stars 9.6k forks source link

Hi Sajatsu,

I think it will change based on the following. If you will do fine-tuning by freezing language model layers, then I think it shouldn't require too many epochs. If you will do fine-tuning, by re-training all layers, then it would be more. Also, if you will first train a language model (by re-training a checkpoint with your own custom dataset), then it would be much much more epochs. I think the best thing would be find the optimum number of epochs by experimenting and additionally, you could always set an early stopping criteria for your epochs such as if it doesn't learn X amount in Y epochs, then stop learning.

I hope this helps. Good luck

google-research / bert

Appropriate training steps for fine-tuning on language model #1182