The unbalance between original tokens and replaced tokens.

Hi, ELECTRA inspires me a lot, but there is a problem that puzzled me a lot. As we all know, only 15% tokens are replaced by generated tokens which can be viewed as the negatives. However, there are still about 85% original tokens, i.e, the positives.
Due to that the label unbalanced is a common issue in classification problem and ELECTRA is designed to predict all the tokens' distributions in a corrupt sentence, a question arose: can ELECTRA accurately find all the negatives (the replaced tokens), predicted to 0 by discriminator, when there have a dominant number of positives (the original tokens)?

google-research / electra

The unbalance between original tokens and replaced tokens. #109