google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.23k stars 9.62k forks source link

Bert pre training approach #1376

Open shu1273 opened 1 year ago

shu1273 commented 1 year ago

Hi,

I have stated working on Bert model. Do anyone know what was Bert pre-training accuracy(not fine tuned) using 100-0-0 masking approach vs 80-10-10 approach. I could not get it anywhere. Basically I understand why 80-10-10 approach is implemented but did they do any experiments to figure this out