Closed JiwenJ closed 1 month ago
As previously discussed, we found that CVPO needs algorithmic adjustments to the environment's configuration (i.e., removing the randomness of the environment layer). We conducted detailed experiments and found that CVPO's performance on random layers was unsatisfactory.
Relevant experimental evidence has also been disclosed in the SafeRL research. You can refer to the paper published at ICLR 2024: Off-Policy Primal-Dual Safe Reinforcement Learning.
Anyway, we are currently trying to upload the requirements for customizing this algorithm's environment further to Safety-Gymnasium modifications to facilitate community use.
If you have any comments or ideas, feel free to discuss them further.
Since there has been no response for a long time, we will close this issue. Please feel free to reopen it if you encounter any new problems!
Required prerequisites
Questions
Hello, I believe there was a CVPO implementation in issue #57. I'm curious as to why it was removed.