Closed stvsd1314 closed 1 year ago
Indeed, it is accurate that all of SafePO's current algorithms solely accommodate continuous space. However, in terms of utilizing an interior point method in the context of discrete space, we suggest consulting the referenced video for further guidance. I have done very little research on this topic.
Our team loves sharing and open-source. We also strongly desire your utilization of our other two algorithmic libraries and environments, specifically tailored toward constrained optimization and safe reinforcement learning.
Thanks for your response! By the way, I wonder that what is the mean of " penalty = self.kappa / (c + 1e-8)" the policy loss calculating of IPO? Shouldn't it be in the form of a log like the equation given in the original paper?
For more details, you can refer to https://github.com/OmniSafeAI/omnisafe/issues/223. Feel free to ask to reopen this if you have more questions.
Hello, thanks for your excellent work. For the IPO algorithm, I wonder that if it is suit for the environment whose action space is discrete? Because of that the interior point algorithm is not suit for the optimization problem whose decisision is discrete.