Separius / BERT-keras

Keras implementation of BERT with pre-trained weights
GNU General Public License v3.0
815 stars 197 forks source link

number of trainable parameters #19

Open andrey999333 opened 5 years ago

andrey999333 commented 5 years ago

I don't quite understand one point. When I downloaded your keras representation of BERT and check the number of trainable parameters in summary, it showed ~177 mil parameters, while in official bert it should be 110 mil for base model. Could you explain where this difference comes from?

Separius commented 5 years ago

Hi, I'm not entirely sure, but maybe it's because of the subword embeddings? most of the time people don't count input embeddings in their model parameters.