I've found the NoisyNets implementation to have unstable training dynamics. In my experiments only 1-2 out of 5 runs converge when using the shortened Pong hyperparams (using both Independent Gaussians and Factored Gaussians). I've found that reducing the learning rate from 1e-4 to 5e-5 seems to increase the stability to 4-5 runs out of 5 with minimal increase to the convergence speed. I hope this helps anybody else out there who might be having trouble with it.
Hi Max,
I've found the NoisyNets implementation to have unstable training dynamics. In my experiments only 1-2 out of 5 runs converge when using the shortened Pong hyperparams (using both Independent Gaussians and Factored Gaussians). I've found that reducing the learning rate from 1e-4 to 5e-5 seems to increase the stability to 4-5 runs out of 5 with minimal increase to the convergence speed. I hope this helps anybody else out there who might be having trouble with it.
Cheers, Dave