Open ankit201 opened 5 years ago
I wouldn't say there is a specific loss value you should be looking to stop training.
You could run run_pretraining.py on a subset of your data that your model hasn't seen and set do_eval
to True
. It should save a text file with the loss and accuracy of "next sentence prediction" and "masked word prediction". You can compare the accuracy of your own model to that of the published english BERT model, and use that as a "goal" (digging through paper, looks like that it had a next sentence prediction accuracy of 97-98%, not sure about masked word prediction probability).
However, I'm not sure if that's really thats the best way to go about it. I think it's better to simply test the accuracy of your model on a downstream task.
What is your batch size? How long did this take?
Duplicate of #95
I am pre-training language model in Hindi of approx 9gb data and vocab of 32k approx. My parameters for training is learning rate = 1e-4 warmup_steps=40k max seq length = 128 training steps = 500000 My tensorboard graph is below. I am training on 32gb [Nvidia] V100 How much more should I train and when should I stop?