pretraining BERT CASED model gives lower accuracy than UNCASED

google-research / bert

TensorFlow code and pre-trained models for BERT

Apache License 2.0

38.06k stars 9.59k forks source link

I pretrained both BERT uncased as well as BERT cased models using the same hyperparameters(for uncased model) on Wikipedia and BookCorpus, but the BERT cased models perform worse than the google checkpoints on downstream tasks. Did you pretrain the cased models differently? Could you share the hyperparameters?

Thanks!

I am experiencing similar issue. I have pre-trained on domain specific data starting from BERT. However, performance on downstream tasks is extremely low compared to results reported in another paper!. Eager to know the reasons!.

google-research / bert

pretraining BERT CASED model gives lower accuracy than UNCASED #1069