hello everyone,
I have learned that in order to reduce the variance of gradient estimator,
usually we apply the "reward baseline" technique in the gradient optimization function like
However, I cannot find any reward baseline technique in SeqGAN code.
Am I missing something?
hello everyone, I have learned that in order to reduce the variance of gradient estimator, usually we apply the "reward baseline" technique in the gradient optimization function like
However, I cannot find any reward baseline technique in SeqGAN code. Am I missing something?
thanks in advance!