google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

Electra-small embedding size #105

Open YovaKem opened 3 years ago

YovaKem commented 3 years ago

The README says ELECTRA-small has 256 hidden units and ELECTRA-base has 768. Here the embedding size for a small model is being set to 128 and for a base model to 768.I have two clarifying questions:

  1. Is embedding size and hidden size used interchangeably and if not, how is the latter to be set in the config?

  2. What is the actual embedding size/hidden size for ELECTRA-small? Is it 128/256, repsectively, or are the two terms used interchangeably, in which case there is an inconsistency between the README and the config and it's not clear if the value should be 128 or 256.

Thanks.

GabboM commented 3 years ago

In ELECTRA embedding size and hidden size should be the same. For what I know only ALBERT has a matrix factorization of the embedding matrix, thus having a smaller embedding than hidden size.

YovaKem commented 3 years ago

From experience, I can say they're not. I tried training a smaller ELECTRA model setting the embedding size (the only hyperparameter concerning size that could be set in the bash script) to 64 and that still trained a model with a hidden size of 256.