google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.33k stars 352 forks source link

some confusions about paper #124

Open leileilin opened 3 years ago

leileilin commented 3 years ago

hey man. I've read your team's papers, and I don't quite understand some of them. In the paper, you mentioned that you don’t back-propagate the discriminator loss through the generator (indeed, you can’t because of the sampling step), but you said you could pretrain the generator and discriminator at the same time, I don't know how it works. Thank you.

yilihsu commented 1 year ago

Same question here. The generator and discriminator are trained together with the total loss in the training codes. Does anyone have any other ideas?