Reinforcement Learning fails to train the Darwin-op to walk

cyberbotics / webots

Webots Robot Simulator

Apache License 2.0

3.15k stars 1.66k forks source link

I'm using OpenAI Baselines PPO algorithm to train the robot Darwin op to walk by reinforcement learning , but the reward will get higher and higher, which means the training process is going on as expected. But the reward falls suddenly and the robot changes his actions, which means the training process before does not work anymore. I have tried other RL algorithms but the same thing happerns, too. I wonder is there any seed or other parameters that will change suddenly in Webots so that when the hidden changing occurs, the RL process fails to work. Is there anyone using RL in webots meeting the same problem?

cyberbotics / webots

Reinforcement Learning fails to train the Darwin-op to walk #1467