codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.09k stars 1.29k forks source link

Default model sizes are much smaller than BERT base #81

Open bertmaher opened 3 years ago

bertmaher commented 3 years ago

The base BERT model in https://arxiv.org/pdf/1810.04805.pdf uses 768 hidden features, 12 layers, 12 heads (which are also the defaults in bert.py), while the default configuration in the argparser of __main__.py uses 256/8/8. Would it make sense to align the example script with the paper? I spent quite a while puzzling over my low GPU utilization with the default configuration. Thanks!