PKU-Alignment / Safe-Policy-Optimization

NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
https://safe-policy-optimization.readthedocs.io/en/latest/index.html
Apache License 2.0
326 stars 45 forks source link

Something about IPO #23

Closed stvsd1314 closed 1 year ago

stvsd1314 commented 1 year ago

Hello, thanks for your excellent work. For the IPO algorithm, I wonder that if it is suit for the environment whose action space is discrete? Because of that the interior point algorithm is not suit for the optimization problem whose decisision is discrete.

zmsn-2077 commented 1 year ago

Indeed, it is accurate that all of SafePO's current algorithms solely accommodate continuous space. However, in terms of utilizing an interior point method in the context of discrete space, we suggest consulting the referenced video for further guidance. I have done very little research on this topic.

  1. Aaron Sidford: Introduction to interior point methods for discrete optimization, lecture II. https://www.youtube.com/watch?v=ivTi_mnMNxw

Our team loves sharing and open-source. We also strongly desire your utilization of our other two algorithmic libraries and environments, specifically tailored toward constrained optimization and safe reinforcement learning.

  1. Safety-Gymnasium. https://github.com/OmniSafeAI/safety-gymnasium
  2. OmniSafe. https://github.com/OmniSafeAI/omnisafe
stvsd1314 commented 1 year ago

Thanks for your response! By the way, I wonder that what is the mean of " penalty = self.kappa / (c + 1e-8)" the policy loss calculating of IPO? Shouldn't it be in the form of a log like the equation given in the original paper?

zmsn-2077 commented 1 year ago

For more details, you can refer to https://github.com/OmniSafeAI/omnisafe/issues/223. Feel free to ask to reopen this if you have more questions.