PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.8k stars 1.28k forks source link

Chaper 14 Deterministic policy gradients results are quite noisy. #86

Open isu10503054a opened 3 years ago

isu10503054a commented 3 years ago

In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy?


擷取 擷取2

I read the content repeatedly, but I still don’t understand why.

Shmuma commented 3 years ago

Random weights initialization adds randomness to initial starting point. Usage if different parallel environments also might add stochastisity

вт, 27 окт. 2020 г., 12:01 isu10503054a notifications@github.com:

In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy?

I read the content repeatedly, but I still don’t understand why.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/86, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ .

isu10503054a commented 3 years ago

Random weights initialization adds randomness to initial starting point. Usage if different parallel environments also might add stochastisity вт, 27 окт. 2020 г., 12:01 isu10503054a notifications@github.com: In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy? I read the content repeatedly, but I still don’t understand why. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#86>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ .

Is there any hyperparameter in the source code that can modification to improve this situation? thx

Shmuma commented 3 years ago

Tons of :). In fact any constant in the code could be seen as hyperparameter:

On Wed, Oct 28, 2020 at 11:41 AM isu10503054a notifications@github.com wrote:

Random weights initialization adds randomness to initial starting point. Usage if different parallel environments also might add stochastisity вт, 27 окт. 2020 г., 12:01 isu10503054a notifications@github.com: … <#m7201119268102051534> In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy? I read the content repeatedly, but I still don’t understand why. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#86 https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/86>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ .

Is there any Hyperparameter in the source code that can modification to improve this situation? thx

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/86#issuecomment-717784649, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQE2WD2H3KPPJI7OQAZQLSM7KLXANCNFSM4TAQL7BQ .

-- wbr, Max Lapan