PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.24k stars 819 forks source link

PPO输出动作归一化 #943

Open yufeng-Lu520 opened 2 years ago

yufeng-Lu520 commented 2 years ago

请问example里的PPO算法中,agent.sample输出的动作为什么不是-1到1呢,如何让输出的动作归一化?

TomorrowIsAnOtherDay commented 2 years ago

https://github.com/PaddlePaddle/PARL/blob/e4a20ae6390265203b359f2b85e1fdd30d373434/examples/PPO/mujoco_model.py#L78 如果想要归一化,在这里加入一个tanh激活函数即可

yufeng-Lu520 commented 2 years ago

十分感谢,已经解决