PPO输出动作归一化

PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning

https://parl.readthedocs.io/

Apache License 2.0

3.24k stars 819 forks source link

Open yufeng-Lu520 opened 2 years ago

yufeng-Lu520 commented 2 years ago

请问example里的PPO算法中，agent.sample输出的动作为什么不是-1到1呢，如何让输出的动作归一化？

TomorrowIsAnOtherDay commented 2 years ago

yufeng-Lu520 commented 2 years ago

十分感谢，已经解决