PKU-Alignment / omnisafe

JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
https://www.omnisafe.ai
Apache License 2.0
912 stars 130 forks source link

[Question] PPO/TRPO-EarlyTerminated code help #293

Closed Obnayuf closed 9 months ago

Obnayuf commented 9 months ago

Required prerequisites

Questions

First of all, thank you very much for your codes, they have been instrumental in helping me understand Safe Reinforcement Learning. I have two questions, the first one is that I observed that the article "Safe Exploration by Solving Early Terminated MDP" uses a Context model in order to ensure the generalization ability of the model in different initial states, but I observed that our PPO/TRPO-EarlyTerminated doesn't have a related implementation, and is it possible for me to ask what is the reason for this? and whether it is possible to introduce network structures such as RNN into omnisafe?
The second question is I would like to ask you how you understand the matter of the difficulty of determining the cost limit in safe reinforcement learning? In practice, I usually use vanilla RL to find the upper limit of the cost limit, and debug a little bit to find the proper value of the cost limit, which is obviously very "unintelligent", and what do you think of paper "Value constrained model-free continuous control"which can auto find proper cost limit in my view?

Gaiejj commented 9 months ago

The current implementation of OmniSafe does not support a context model. During the implementation process, we focused on formulating an ET-MDP, and per the original paper, Intuitively, solving ET-MDP is similar to solving normal MDPs as there are no constraints that should be considered. Any prevailing algorithm can be applied as ET-MDP solver, such as TD3 , SAC, PPO, TRPO, the context model is merely a solution suitable for ET-MDPs. But I believe incorporating a context model could be very valuable, and we will indeed add it to OmniSafe's to-do list. Thank you for bringing it up.

Indeed, searching for a suitable cost-limit does require a grid search. However, OmniSafe currently allows for some automation of this process. You can use the file examples/run_experiment_grid.py to specify multiple experiments with different cost limits and examples/analyze_experiment_results.py to visualize the results for different cost limits. Your suggestion for a more automated implementation of cost-limit searching will also be considered.

Thank you once again for your insightful proposals.

Obnayuf commented 9 months ago

thanks for ur reply.