google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.15k stars 9.6k forks source link

Appropriate training steps for fine-tuning on language model #1182

Open sajastu opened 3 years ago

sajastu commented 3 years ago

I want to fine-tune BERT-base-uncased for the language model, according to my custom dataset. It consists of around 80M tweets. I'm a bit puzzled about how many training steps I should set so it is trained optimally (not under-/over-fit). The README says that it should practically be more than/around 10k steps, but what about large data collections as such the one I have? Does anybody have any estimation?

damlitos commented 3 years ago

Hi Sajatsu,

I think it will change based on the following. If you will do fine-tuning by freezing language model layers, then I think it shouldn't require too many epochs. If you will do fine-tuning, by re-training all layers, then it would be more. Also, if you will first train a language model (by re-training a checkpoint with your own custom dataset), then it would be much much more epochs. I think the best thing would be find the optimum number of epochs by experimenting and additionally, you could always set an early stopping criteria for your epochs such as if it doesn't learn X amount in Y epochs, then stop learning.

I hope this helps. Good luck