cyberbotics / webots

Webots Robot Simulator
https://cyberbotics.com
Apache License 2.0
3.15k stars 1.66k forks source link

Reinforcement Learning fails to train the Darwin-op to walk #1467

Closed Weiyi-Zhang258 closed 4 years ago

Weiyi-Zhang258 commented 4 years ago

I'm using OpenAI Baselines PPO algorithm to train the robot Darwin op to walk by reinforcement learning , but the reward will get higher and higher, which means the training process is going on as expected. But the reward falls suddenly and the robot changes his actions, which means the training process before does not work anymore. I have tried other RL algorithms but the same thing happerns, too. I wonder is there any seed or other parameters that will change suddenly in Webots so that when the hidden changing occurs, the RL process fails to work. Is there anyone using RL in webots meeting the same problem?

DavidMansolino commented 4 years ago

About the seed, the internal seed is always the same and it is reset each time the simulation is either reset or revert. Maybe one thing to try is to have your learning algorithm as an external script (such as a Python one for example) that will launch Webots at each learning iteration. This way you will be 100% sure that each iteration is the same from a simulation point of view. Just for information, fur such general question you may use our Discord community channel next time: https://discordapp.com/invite/nTWbN9m