Open yzhang123 opened 4 years ago
I pretrained both BERT uncased as well as BERT cased models using the same hyperparameters(for uncased model) on Wikipedia and BookCorpus, but the BERT cased models perform worse than the google checkpoints on downstream tasks. Did you pretrain the cased models differently? Could you share the hyperparameters?
Thanks!
I am experiencing similar issue. I have pre-trained on domain specific data starting from BERT. However, performance on downstream tasks is extremely low compared to results reported in another paper!. Eager to know the reasons!.
I pretrained both BERT uncased as well as BERT cased models using the same hyperparameters(for uncased model) on Wikipedia and BookCorpus, but the BERT cased models perform worse than the google checkpoints on downstream tasks. Did you pretrain the cased models differently? Could you share the hyperparameters?
Thanks!