Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Question regarding the commit : Fix stddev to match NoisyNet paper #51

Closed marintoro closed 5 years ago

marintoro commented 5 years ago

Hello, I see one commit (6c8b281) which tries to fix the default value of the stddev in Noisy layer but I think this is anyway overridden by the default value in args.py which is 0.1.

Moreover there is a small note in the original Rainbow paper - small note just above Table 1 - stating : "The noise was generated on the GPU. Tensorflow noise generation can be unreliable on GPU. If generating the noise on the CPU, lowering σ0 to 0.1 may be helpful"

So I think in all your experiments you actually used 0.1 and not 0.5 (nor 0.4) and this may be the right thing to do?

Kaixhin commented 5 years ago

Yes the default value in the layer class code was set to the CPU default (0.5) - 0.4 must have been an error. And yes the value is usually overridden by the GPU default (0.1) in main.py for running Rainbow. I can't remember if I experimented much with this, but seeing as the results replicated I've kept 0.1.

alirezakazemipour commented 2 years ago

Hello, I see one commit (6c8b281) which tries to fix the default value of the stddev in Noisy layer but I think this is anyway overridden by the default value in args.py which is 0.1.

Moreover there is a small note in the original Rainbow paper - small note just above Table 1 - stating : "The noise was generated on the GPU. Tensorflow noise generation can be unreliable on GPU. If generating the noise on the CPU, lowering σ0 to 0.1 may be helpful"

So I think in all your experiments you actually used 0.1 and not 0.5 (nor 0.4) and this may be the right thing to do?

Thank you very much for pointing it out! I had lost that little recommendation above the table, and it has a HUGE impact on the final performance. Actually, I could not get any result on the game of Boxing when I set σ0 to 0.5 but, with 0.1 everything got right.