I'm getting a very strange behavior that I can't explain when running your code, and I'm interested to know if you can reproduce, or help me understand.
I wanted to see if I can calculate a better reward, and along the way I tested with fixed values. Meaning, I replaced the implementation of rollout.py:get_reward() with:
Surprisingly, it had the generator achieve faster convergence onto a lower value of the test error (see attached log). I got pretty much the same behavior when I used rewards uniformly sampled from [0,1]. I'm not sure what to make of it..
Also, a question: Why is the rollout network lagging behind the generator (default value is 0.8)? Don't we want in theory to sample from the latest generator?
Hi,
I'm getting a very strange behavior that I can't explain when running your code, and I'm interested to know if you can reproduce, or help me understand. I wanted to see if I can calculate a better reward, and along the way I tested with fixed values. Meaning, I replaced the implementation of rollout.py:get_reward() with:
rewards = np.zeros((64,20)) rewards.fill(2) return rewards
Surprisingly, it had the generator achieve faster convergence onto a lower value of the test error (see attached log). I got pretty much the same behavior when I used rewards uniformly sampled from [0,1]. I'm not sure what to make of it..
Also, a question: Why is the rollout network lagging behind the generator (default value is 0.8)? Don't we want in theory to sample from the latest generator?
fixed-seqgan-log.txt