JalterMain / DDPG_Antv2

Implementing a modified-DDPG algorithm in OpenAI gym's Ant-V2 environment
MIT License
0 stars 0 forks source link

Hyperparameter tuning & other to prevent divergence #1

Open JalterMain opened 2 years ago

JalterMain commented 2 years ago

-> Policy diverges quickly. As gradients have been fixed (hopefully), main suspects are probably one of these (or a combination):

JalterMain commented 2 years ago
JalterMain commented 2 years ago

A lot of instability during training, even in later stages. Seems like a clear indicator of overestimation bias.