Open Liqinyan821 opened 1 week ago
You must be a very meticulous person! In fact, this is a trick we discovered while debugging the algorithm, which makes P3O more suitable for high-dimensional complex environments. Have you tried removing the clip? Do you have any experimental data? If it performs well without it, we will modify this implementation later.
Required prerequisites
Questions
Hello Omnisafe team, thank you very much for your contribution. When I was Learning the p3o algorithm, I found that the def _loss_pi_cost function was not clip, and loss_pi_cost in the P3O Optimization for Safe Reinforcement Learning used clip.