Closed Seraphli closed 6 years ago
I also run the code SeqGAN. The rewards in SeqGAN experiment are changing.
Because of the Bootstrapped Rescaled Activation trick
@LeeJuly30 Could you give more explanation? I can't understand why Bootstrapped Rescaled Activation trick causes this problem. And how is that even trained with fixed rewards?
In this paper,
For each timestep t, we rescale the t-th column vector R
if you look at the formula, you will find that the reward after rescaled only depends on the batch size B and it's rank, so expectation and variance of reward within a mini-batch won't change. By doing this,
the rescale activation serves as a value stabilizer that is helpful for algorithms that are sensitive in numerical variance
For a mini-batch
@LeeJuly30 But the results above ran for several mini-batch (total batch 115
), and the rewards didn't change at all. So when will it change?
The reward will change, it is the expectation and variance of reward that won't change.
@LeeJuly30 Thanks for tips. I find the rewards do change and the magnitude of rewards before rescaling change a lot.
When I run the code, I try to print the mean value of rewards. Strange is, the mean of rewards didn't change while training. Code snippet I used is here. I just add a print under line 285:
The output is here: