German BERT Dataset sampling

Phil1108 commented 4 years ago

Hi, do you sampled each dataset (Wikipedia, Common Crawl, Subtitles etc.) equally during German-BERT Training? OpenAI uses a unequal sampling, which may lead to a better result, as stated in the GPT-3 Paper:

Note that during training, datasetsare not sampled in proportion to their size, but rather datasets we view as higher-quality are sampled more frequently,such that CommonCrawl and Books2 datasets are sampled less than once during training, but the other datasets aresampled 2-3 times. This essentially accepts a small amount of overfitting in exchange for higher quality training data

If yes, which paremeters do you used?

GPT-3-Table

stefan-it commented 4 years ago

Hi @Phil1108 ,

I didn't use a specific sampling method (so all parts are sampled equally). But I think this could be interesting for future work to e.g. see the effects on downstream tasks :)

Phil1108 commented 4 years ago

@stefan-it Okay thanks. Then I'll give it a try and see how it performs in comparison to your models

dbmdz / berts

German BERT Dataset sampling #16