LantaoYu / SeqGAN

Implementation of Sequence Generative Adversarial Nets with Policy Gradient
2.08k stars 711 forks source link

About baseline of reward function #45

Open jackyyeh5111 opened 6 years ago

jackyyeh5111 commented 6 years ago

hello everyone, I have learned that in order to reduce the variance of gradient estimator, usually we apply the "reward baseline" technique in the gradient optimization function like image

However, I cannot find any reward baseline technique in SeqGAN code. Am I missing something?

thanks in advance!

TobiasLee commented 6 years ago

The code doesn't have this baseline trick. You can try it and evaluate it yourself