google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.06k stars 9.59k forks source link

pretraining BERT CASED model gives lower accuracy than UNCASED #1069

Open yzhang123 opened 4 years ago

yzhang123 commented 4 years ago

I pretrained both BERT uncased as well as BERT cased models using the same hyperparameters(for uncased model) on Wikipedia and BookCorpus, but the BERT cased models perform worse than the google checkpoints on downstream tasks. Did you pretrain the cased models differently? Could you share the hyperparameters?

Thanks!

ibrahimishag commented 4 years ago

I pretrained both BERT uncased as well as BERT cased models using the same hyperparameters(for uncased model) on Wikipedia and BookCorpus, but the BERT cased models perform worse than the google checkpoints on downstream tasks. Did you pretrain the cased models differently? Could you share the hyperparameters?

Thanks!

I am experiencing similar issue. I have pre-trained on domain specific data starting from BERT. However, performance on downstream tasks is extremely low compared to results reported in another paper!. Eager to know the reasons!.