google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

The unbalance between original tokens and replaced tokens. #109

Open allanchen95 opened 3 years ago

allanchen95 commented 3 years ago

Hi, ELECTRA inspires me a lot, but there is a problem that puzzled me a lot. As we all know, only 15% tokens are replaced by generated tokens which can be viewed as the negatives. However, there are still about 85% original tokens, i.e, the positives.
Due to that the label unbalanced is a common issue in classification problem and ELECTRA is designed to predict all the tokens' distributions in a corrupt sentence, a question arose: can ELECTRA accurately find all the negatives (the replaced tokens), predicted to 0 by discriminator, when there have a dominant number of positives (the original tokens)?