strange behavior of reward signal

lederg commented 7 years ago

Hi,

I'm getting a very strange behavior that I can't explain when running your code, and I'm interested to know if you can reproduce, or help me understand. I wanted to see if I can calculate a better reward, and along the way I tested with fixed values. Meaning, I replaced the implementation of rollout.py:get_reward() with:

rewards = np.zeros((64,20)) rewards.fill(2) return rewards

Surprisingly, it had the generator achieve faster convergence onto a lower value of the test error (see attached log). I got pretty much the same behavior when I used rewards uniformly sampled from [0,1]. I'm not sure what to make of it..

Also, a question: Why is the rollout network lagging behind the generator (default value is 0.8)? Don't we want in theory to sample from the latest generator?

fixed-seqgan-log.txt

eduOS commented 6 years ago

Very interesting experiments!

MichaelZhouwang commented 5 years ago

interesting. Did you figured out why?

LantaoYu / SeqGAN

strange behavior of reward signal #21