Why do I encounter a sudden MLM accuracy drop during training?

dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Apache License 2.0

3.01k stars 525 forks source link

I am training a BERT-base model for Chinese. Default MLM and NSP tasks are used. I am trying to train the model for 96k steps to see if it benefits from longer training procedure. However, from step 65600 to step 65700, the MLM accuracy drops dramatically from 0.827 to 0.774 while the NSP accuracy remains high and stable. I am wondering how the drop takes place.

I have around 226k sentences in the original corpus and each one is split into two parts from the middle, just like book_review_bert.txt. During data preprocess, I modified the dup_factor from 5 to 50 to ensure diversity. The actual batch_size is [16 (args.batch_size) x 2 (args.world_size) x 1 (args.accumulation_steps)].

dbiir / UER-py

Why do I encounter a sudden MLM accuracy drop during training? #350