PKU-Alignment / omnisafe

JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
https://www.omnisafe.ai
Apache License 2.0
945 stars 132 forks source link

[Question] A question about the cost function of the p3o algorithm #358

Open Liqinyan821 opened 1 week ago

Liqinyan821 commented 1 week ago

Required prerequisites

Questions

Hello Omnisafe team, thank you very much for your contribution. When I was Learning the p3o algorithm, I found that the def _loss_pi_cost function was not clip, and loss_pi_cost in the P3O Optimization for Safe Reinforcement Learning used clip. 87bf7a541d27ee53fc4f1bcdfa47bd81 324c76d3af56db011f799976ca22c297

Gaiejj commented 5 hours ago

You must be a very meticulous person! In fact, this is a trick we discovered while debugging the algorithm, which makes P3O more suitable for high-dimensional complex environments. Have you tried removing the clip? Do you have any experimental data? If it performs well without it, we will modify this implementation later.