google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

Electra Vocabulary #123

Open avinashsai opened 3 years ago

avinashsai commented 3 years ago

Hi,

I have a query about Electra's vocabulary. In contrast to other models where vocabulary words are clearly seen, why are words in Electra vocab [unused0], [unused1]..... except for a few special tokens ?

claeyzre commented 3 years ago

Hi

This is the vocabulary from the original BERT implementation, a quick Google search gave me this: https://stackoverflow.com/questions/62452271/understanding-bert-vocab-unusedxxx-tokens