google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
37.88k stars 9.56k forks source link

How much loss is optimum for good model and when to stop training #749

Open ankit201 opened 5 years ago

ankit201 commented 5 years ago

I am pre-training language model in Hindi of approx 9gb data and vocab of 32k approx. My parameters for training is learning rate = 1e-4 warmup_steps=40k max seq length = 128 training steps = 500000 My tensorboard graph is below. I am training on 32gb [Nvidia] V100 How much more should I train and when should I stop? Screenshot (197)

jaymody commented 5 years ago

I wouldn't say there is a specific loss value you should be looking to stop training.

You could run run_pretraining.py on a subset of your data that your model hasn't seen and set do_eval to True. It should save a text file with the loss and accuracy of "next sentence prediction" and "masked word prediction". You can compare the accuracy of your own model to that of the published english BERT model, and use that as a "goal" (digging through paper, looks like that it had a next sentence prediction accuracy of 97-98%, not sure about masked word prediction probability).

However, I'm not sure if that's really thats the best way to go about it. I think it's better to simply test the accuracy of your model on a downstream task.

abhi060698 commented 5 years ago

What is your batch size? How long did this take?

xingchensong commented 5 years ago

Duplicate of #95